Taking Frame from Video vs Taking a Photo - opencv

My specific question is: What are the drawbacks to using a snipped frame from a video vs taking a photo?
Details:
I want to use frames from live video streams to replace taking pictures because it is faster. I have already researched and considered:
Videos need faster shutter speed, leading to higher possibility of blurring
Faster shutter speed also means less exposure to light, leading to potentially darker images
A snipped frame from a video will probably be lower resolution (although maybe we can possibly turn up the resolution to compensate for this?)
Video might take up more memory -- I am still exploring the details with another post (What is being stored and where when you use cv2.VideoCapture()?)
Anything else?
I will reword my question to make it (possibly) easier to answer: What changes must I make to a "snip frame from video" process to make the result equivalent to taking a photo? Are these changes worth it?

The maximum resolution in picamera is 2592x1944 for still photos and 1920x1080 for video recording. Other issues to take into account are that you cannot receive all formats from VideoCapture, so now conversion of the YUV frame to JPG will be your responsibility. OK, OpenCV can handle this, but it takes considerable CPU time and memory.

Related

Is there a quality difference between output of AVCaptureMovieFileOutput and AVCaptureVideoDataOutput?

In the process of capturing a light trail photo, I noticed that for fast moving objects, there is slightly more discontinuity between successive frames if I use the sample buffers from AVCaptureVideoDataOutput compared to if I record a movie and extract frames and run the same algo.
Is there a refresh rate/frame rate difference if the two modes are used?
A colleague who has experience in professional photography claims that there is a visible lag even in Apple's default camera app when comparing the preview in Photo mode and Video mode but it is not something very obvious to me.
Furthermore, I am actually capturing video at a low frame rate (close to highest exposure)
To conclude these experiments, I need to know if there is any definitive proof to confirm or disprove the same

How to estimate bandwidth / speed requirements for real-time streaming video?

For a project I'm working on, I'm trying to stream video to an iPhone through its headphone jack. My estimated bitrate is about 200kbps (If i'm wrong about this, please ignore that).
I'd like to squeeze as much performance out of this bitrate as possible and sound is not important for me, only video. My understanding is that to stream a a real-time video I will need to encode it with some codec on-the-fly and send compressed frames to the iPhone for it to decode and render. Based on my research, it seems that H.265 is one of the most space efficient codecs available so i'm considering using that.
Assuming my basic understanding of live streaming is correct, how would I estimate the FPS I could achieve for a given resolution using the H.265 codec?
The best solution I can think of it to take a video file, encode it with H.265 and trim it to 1 minute of length to see how large the file is. The issue I see with this approach is that I think my calculations would include some overhead from the video container format (AVI, MKV, etc) and from the audio channels that I don't care about.
I'm trying to stream video to an iPhone through its headphone jack.
Good luck with that. Headphone jack is audio only.
My estimated bitrate is about 200kbps
At what resolution? 320x240?
I'd like to squeeze as much performance out of this bitrate as possible and sound is not important for me, only video.
Then, drop the sound streams all together. Really though, 200kbit isn't enough for video of any reasonable size or quality.
Assuming my basic understanding of live streaming is correct, how would I estimate the FPS I could achieve for a given resolution using the H.265 codec?
Nobody knows, because you've told us almost nothing about what's in this video. The bandwidth required for the video is a product of many factors, such as:
Resolution
Desired Quality
Color Space
Visual complexity of the scene
Movement and scene changes
Tweaks and encoding parameters (fast start? low latency?)
You're going to have to decide what sort of quality you're willing to accept, and decide subjectively what the balance between that quality and frame rate is. (Remember too that if there isn't much going on, you basically get frames for free since they take very little bandwidth. Experiment.)
The best solution I can think of it to take a video file, encode it with H.265 and trim it to 1 minute of length to see how large the file is.
Take many videos, typical of what you'll be dealing with, and figure it out from there.
The issue I see with this approach is that I think my calculations would include some overhead from the video container format (AVI, MKV, etc) and from the audio channels that I don't care about.
Your video stream won't have a container at all? Not even TS? You can use FFmpeg to dump the raw stream data for you.

Take photo during video-input

I'm currently trying to take an image in the best quality during capturing video at a lower quality. The problem is, that i'm using the video stream to check if face are in front of the cam and this needs lot's of resources, so i'm using a lower quality video stream and if there are any faces detected I want to take a photo in high quality.
Best regards and thank's for your help!
You can not have multiple capture sessions so at some point you will need to swap to higher resolution. First thing you are saying that face detection takes too much resources when using high res snapshots.. Why not try to simply down-sample the image and keep using high resolution all the time (send the down sampled one to the face detection, display the high res):
I would start with most common apple's graphic context and try to down scale it. If that takes too much cpu you could try to do the same on the GPU (find some library that does that or just create a simple program) or you could even try to simply drop odd lines and columns of the image as the raw data. In any of those cases you should also note that you probably do not need the face detection on the same thread as displaying, also you most likely don't even need a high frame rate for the detection (you display camera a full FPS but update the face recognition at 10 FPS for instance).
Another thing you can do is simply have the whole thing in low res, then when you need to take the image stop the session, start high res session, take a screenshot and swap back to low res for face detection.

How does streaming stream 30 480x640 images over a 2mbit/s line

I'm having a strange realization while working on a project I'm having.
I created a streaming solution where i stream a image with the resolution 480x640 totaling at 30’720 pixels, and every pixel contains 32bits of data and by my calculations this means that every frame totals to 1,2MB of data which means that 30fps would total to a 36MB/s line.
So to my question how does a streaming solution stream 30fps over f.ex 2mbit/s line?
I'm guessing that the same question can probably used to explain how a jpg image with a 480x640 resolution takes up <100KB
Compression is your friend.
I don't know the specifics of your solution, but a few assumptions can be made.
First off, even if you send each frame as a full frame, they should be compressed. Even lossless compression should get you some pretty good compression rates, but if you go with something lossy (like jpg) then you can get even more.
But that's not all you get. Any good video codec should provide significant compression as well. Parts of the image that don't change between frames don't need to be sent at all, and other parts can be compressed nicely too (I don't know much specifics about the compression used, but there's a lot of stuff that's done to compress it).
This all adds up to a lot of savings over sending a full 32bit bitmap for every frame.
Compression is a very broad topic. Just to get an idea, try reading the wikipedia page about image compression
As a very basic solution to your problem, I would personally jpeg-encode the first frame, then, jpeg-encode the differences between two consecutive frames.
For jpeg compression there are many libraries providing the functionality, without the need to implement it yourself.
If you are not so interested in the quality, you can also subsample the video, for example obtaining frames of resolution 240*320

iOS video playback

In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?
Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.
JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.
Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.

Resources