Although i've searched SO and read documentation multiple times on AVCaptureConnection, AVCaptureSession, AVCaptureVideoPreviewLayer, AVCaptureDevice, AVCaptureInput/Output … i'm still confused about all this AV stuff. When it comes to this, it's one big pile of abstract words to me, that don't make much sense. I'm asking to shed some light on the subject for me here.
So, can anyone explain coherently in plain english the logic of proper setup and use of the media devices? What is AVCaptureVideoPreviewLayer? What is AVCaptureConnection? Input/Output?
I want to catch the basic idea the people who made this stuff had while making it.
Thanks
I wish I had more time to write a more thorough reply. Here are some simplified basics:
In order to work with audio and video coming from the hardware, destined for the screen or files, you need to setup an AVCaptureSession that helps coordinate the sources and the destinations, using AVCaptureConnections. You use the session instance to start and stop the process, along with setting some output properties like bitrate and quality. You use the AVCaptureConnection instance(s) to control the connection between an AVCaptureInputPort and an AVCaptureOutputPort (or AVCaptureVideoPreviewLayer), such as monitoring input levels of sounds or setting the orientation of the video.
AVCaptureInputPort are different inputs from AVCaptureDevice - which is where your video or audio is coming from, such as the camera or the microphone. You will normally look through all available devices and choose those that have the properties you are looking for, such as if they are audio, or if they are the front-facing camera.
AVCaptureOutput is where the AV is sent - it might be a file or a routine that allows you to process the data in real-time, etc.
AVCaptureVideoPreviewLayer is an OpenGL layer that is optimized for very fast rendering of the output of the selected video input device (front or back camera). You typically use this to show your user what input you are working with - sort of like a camera viewfinder.
If you are going to use this stuff, then you must read Apple's AV Foundation Programming Guide
Here's an image that may help you some more (from above-mentioned doc):
A more detailed view:
Related
I'm looking for a way to create a audio bars visualizer similar to this in iOS.
Every white bar will move up and down depending of audio wave. I'm really lost because haven't much experience dealing with audio in Objective-c.
EDIT: What i'm seeking is what Overcast's app does on its visualizer (the group of vertical orange bars on the lower part of the podcast's image)
Anyone can help?
Thanks
EDIT: Thanks to Tomer's answer I finally made it. First I did this tutorial in order to make it all clear. Then I created my own VisualizerView for my project, you can find it in this gist. Maybe is not perfect but it does what I needed to do.
Generally, you have a few options if you want to get an idea of what something sounds like in iOS:
Use the simple AVAudioPlayer audio player, and then use the [audioPlayer averagePowerForChannel:] method to get the avarage audio level for the current moment. Check out this tutorial.
Use the Audio Queue API, which lets you send whatever audio you want to the speaker: You would read audio from your source and fill the buffers with it every time. (If you're reading from a file, use AVAssetReader) This way you always know exactly what waveform you're playing, so you can, for example, calculate its avarage power or process it in other ways like FFT. Then you'd update the bars accordingly.
EDIT: The standard way of doing such a thing is to use the Fast Fourier Transform (FFT) - it extracts frequency information from a sound. Here's a good example of using it on iOS (Apple's guide here). But, of course, to use it you have to know exactly what waveform you're playing every time, so you'd probably want to use a lower-level API such as Audio Queue.
I am recording videos and playing them back using AVFoundation. Everything is perfect except the hissing which is there in the whole video. You can hear this hissing in every video captured from any iPad. Even videos captured from Apple's inbuilt camera app has it.
To hear it clearly, you can record a video in a place as quiet as possible without speaking anything. It can be very easily detected through headphones and keeping volume to maximum.
After researching, I found out that this hissing is made by preamplifier of the device and cannot be avoided while recording.
Only possible solution is to remove it during post processing of audio. Low frequency noise can be removed by implementing low pass filter and noise gates. There are applications and software like Adobe Audition which can perform this operation. This video shows how it is achieved using Adobe Audition.
I have searched Apple docs and found nothing which can achieve this directly. So I want to know if there exists any library, api or open source project which can perform this operation. If not, then how can I start going in right direction because it does looks like a complex task.
I'm looking for a tips to develop an application for iPhone/iPad that will be able to process video (let's consider only local files stored on the device for simplicity) and play it in real-time. For example you can choose any movie and choose "Old movie" filter and want it like on old lamp TV.
In order to make this idea real i need to implement two key features:
1) Grab frames and audio stream from a movie file and get access to separate frames (I'm interested in raw pixel buffer in BGRA or at least YUV color space).
2) Display processed frames somehow. I know it's possible to render processed frame to OpenGL texture, but i would like to have more powerful component with playback controls. Is there any class of media player that supports playing custom image and audio buffers?
The processing function is done and it's fast (less than duration on one frame).
I'm not asking for ready solution, but any tips are welcome!
Answer
Frame grabbing.
It seems the only way to grab video and audio frames is to use AVAssetReader class. Although it's not recommended to use for real-time grabbing it does the job. In my tests on iPad2 grabbing single frame needs about 7-8 ms. Seeking across the video is a tricky. Maybe someone can point to more efficient solution?
Video playback. I've done this with custom view and GLES to render a rectangle texture with a video frame inside of it. As far as i know it's the fastest way to draw bitmaps.
Problems
Need to manually play a sound samples
AVAssetReader grabbing should be synchronized with a movie frame rate. Otherwise movie will go too fast or too slow.
AVAssetReader allows only continuous frame access. You can't seek forward and backward. Only proposed solution is to delete old reader and create a new with trimmed time range.
This is how you would load a video from the camera roll..
This is a way to start processing video. Brad Larson did a great job..
How to grab video frames..
You can use AVPlayer+ AVPlayerItem, it provide you a chance to apply a filter on the display image.
I need to simultaneously display a video that is playing in my applciation, full screen on a larger monitor. On some video cards, this is called Theater mode and is configured using a tool that the card manufacturer supplies.
I would like to do this with only software. Can I do this with DirectX?
My idea is to take the currently active video playing using DirectShow and repaint it on a second display (as configured by the user) in full screen mode.
What technologies or methods would I use for this?
The straightforward way is to split yet encoded video into two branches and use two video renderer set to present video on different monitors. One renderer could be a part of your application UI, the other could expand full screen on the large secondary monitor.
Splitting encoded video give you an option to still leverage hardware assisted decoding (DXVA) if available. You might prefer to use software only decoder and split already decoded video - this is also going to work.
You might additionally want to implement filter which would separately temporarily disable one or the other renderer, such as for example by stopping passing media samples through.
Another thing you can do is to use bridging to even more flexibly control the renderers and be able to detach them from media source.
I want to make something like they have at US dmv's where you sit down and it takes your picture, maybe like photobooth.
I want to connect a high end camera via usb, fire the camera and get the picture.
There's the Picture Transfer Protocol http://en.wikipedia.org/wiki/Picture_Transfer_Protocol a nastly little thing. All the cameras I held in my hands so far, claiming they had proper PTP support failed it somewhere. But in theory one can use PTP to remote control a camera, i.e. trigger the shutter, retrieve the picture and so on.
Rater than reimplementing the whole thing I recommend you get some readily usable PTP library. There are some open source ones listed on http://ptp.sourceforge.net
The easiest method is probably to use OpenCV: http://opencv.willowgarage.com/wiki/
If you need a high end camera - most digital SLRs have a tethered mode where you can control the camera, fire the shutter and retrieve the image data. Each camera maker has a proprietary (but normally free) sdk.
For a webcam type camera - these normally run in video mode, you simply grab an image out f the video stream - as PaulR says - use openCV