AVSampleBufferDisplayLayer plays too fast

AVSampleBufferDisplayLayer plays too fast - ios

So I have put together a sample project https://github.com/liuxuan30/TestH264.git that uses VideoToolBox to have a H264 sample decoder to display a stream file, captured from a camera.
The H264 decoder using VideoToolBox is copied from internet, I didn't write it, when I tried to play my h264 stream file, it plays too fast, comparing to ffmpeg or ffplay, which both played back at a normal speed.
I wanted to ask, how to fix this behaviour? Thanks.

This happens because of this constant kCMSampleAttachmentKey_DisplayImmediately:
If this key is present, the sample should be displayed as soon as possible rather than
according to its presentation timestamp. Use this attachment at run time to request this
behavior from a display pipeline such as the AVSampleBufferDisplayLayer class.
This attachment is not written to media files.
from Apple documation
So you have two options of displaying:
Display immediately - which is probably good for real-time stream, when you need to display frame as soon as possible
Display frames at specific timestamp
*comparing to ffmpeg or ffplay, which both played back at a normal speed.
ffplay and ffmpeg probably use timestamp at this point.
I have same result as you from your test H.264 file, but it's happens because you get all decoded frame at once so decoder is displaying it immediately.
You can watch this video for more information about VideoToolbox framework:
Direct Access to Video Encoding and Decoding

Related

Real time audio recording in Swift

I am building an application which needs to do real time audio recording. I am using Swift for the project - so unable to use Novocaine library (as it has some Obj-C++ code).
What I need is get small chunks of the audio recording (real-time) which I can process or send to my websocket. Is there a Swift library that I can use to achieve this?
In addition to getting the live audio from the microphone, I also need to show a real time waveform.
Start recording
Get an event every for few bytes of recorded data, where I can send these bytes to my websocket.
Showing a waveform for the audio.
Let me know.

You do not need any of 3-rd party tools for getting audio from mic. It can be set up easily using AVAudioEngine. However, for minimising network traffic I suggest to use lame for compressing raw PCM audio stream into mp3.
Here you can find project with minimal functionality for getting mic input and compressing into mp3. In this example project mp3 stores into Documents folder, so you can try and listen to make sure it works.
From this point you can take mp3 buffer and send via socket. You can also play with lame settings to change quality, etc.
There is another branch called no-lame where same functionality implemented without lame encoding. Look here

Transcoding fMP4 to HLS while writing on iOS using FFmpeg

TL;DR
I want to convert fMP4 fragments to TS segments (for HLS) as the fragments are being written using FFmpeg on an iOS device.
Why?
I'm trying to achieve live uploading on iOS while maintaining a seamless, HD copy locally.
What I've tried
Rolling AVAssetWriters where each writes for 8 seconds, then concatenating the MP4s together via FFmpeg.
What went wrong - There are blips in the audio and video at times. I've identified 3 reasons for this.
1) Priming frames for audio written by the AAC encoder creating gaps.
2) Since video frames are 33.33ms long, and audio frames 0.022ms long, it's possible for them to not line up at the end of a file.
3) The lack of frame accurate encoding present on Mac OS, but not available for iOS Details Here
FFmpeg muxing a large video only MP4 file with raw audio into TS segments. The work was based on the Kickflip SDK
What Went Wrong - Every once in a while an audio only file would get uploaded, with no video whatsoever. Never able to reproduce it in-house, but it was pretty upsetting to our users when they didn't record what they thought they did. There were also issues with accurate seeking on the final segments, almost like the TS segments were incorrectly time stamped.
What I'm thinking now
Apple was pushing fMP4 at WWDC this year (2016) and I hadn't looked into it much at all before that. Since an fMP4 file can be read, and played while it's being written, I thought that it would be possible for FFmpeg to transcode the file as it's being written as well, as long as we hold off sending the bytes to FFmpeg until each fragment within the file is finished.
However, I'm not familiar enough with the FFmpeg C API, I only used it briefly within attempt #2.
What I need from you
Is this a feasible solution? Is anybody familiar enough with fMP4 to know if I can actually accomplish this?
How will I know that AVFoundation has finished writing a fragment within the file so that I can pipe it into FFmpeg?
How can I take data from a file on disk, chunk at a time, pass it into FFmpeg and have it spit out TS segments?

Strictly speaking you don't need to transcode the fmp4 if it contains h264+aac, you just need to repackage the sample data as TS. (using ffmpeg -codec copy or gpac)
Wrt. alignment (1.2) I suppose this all depends on your encoder settings (frame rate, sample rate and GOP size). It is certainly possible to make sure that audio and video align exactly at fragment boundaries (see for example: this table). If you're targeting iOS, I would recommend using HLS protocol version 3 (or 4) allowing timing to be represented more accurately. This also allows you to stream audio and video separately (non-multiplexed).
I believe ffmpeg should be capable of pushing a live fmp4 stream (ie. using a long-running HTTP POST), but playout requires origin software to do something meaningful with it (ie. stream to HLS).

Capture, encode then stream video from an iPhone to a server

I've got experience with building iOS apps but don't have experience with video. I want to build an iPhone app that streams real time video to a server. Once on the server I will deliver that video to consumers in real time.
I've read quite a bit of material. Can someone let me know if the following is correct and fill in the blanks for me.
To record video on the iPhone I should use the AVFoundation classes. When using the AVCaptureSession the delegate method captureOutput:didOutputSampleBuffer::fromConnection I can get access to each frame of video. Now that I have the video frame I need to encode the frame
I know that the Foundation classes only offer H264 encoding via AVAssetWriter and not via a class that easily supports streaming to a web server. Therefore, I am left with writing the video to a file.
I've read other posts that say they can use two AssetWritters to write 10 second blocks then NSStream those 10 second blocks to the server. Can someone explain how to code the use of two AVAssetWriters working together to achieve this. If anyone has code could they please share.

You are correct that the only way to use the hardware encoders on the iPhone is by using the AVAssetWriter class to write the encoded video to a file. Unfortunately the AVAssetWriter does not write the moov atom to the file (which is required to decode the encoded video) until the file is closed.
Thus one way to stream the encoded video to a server would be to write 10 second blocks of video to a file, close it, and send that file to the server. I have read that this method can be used with no gaps in playback caused by the closing and opening of files, though I have not attempted this myself.
I found another way to stream video here.
This example opens 2 AVAssetWriters. Then on the first frame it writes to two files but immediately closes one of the files so the moov atom gets written. Then with the moov atom data the second file can be used as a pipe to get a stream of encoded video data. This example only works for sending video data but it is very clean and easy to understand code that helped me figure out how to deal with many issues with video on the iPhone.

MJPEG Stream Information

I am receiving a MJPEG Stream from my camera. When I look at the video data with an hex editor it seems that it doesn't contain any streaming information. I just see one raw JPEG after another, but no information about the framerate etc. .
Is the lack of any meta information normal for MJPEG or is it just related to the camera I am using? If there a no information about the stream, how can a player know how fast to play the video?

The lack of metadata is normal. IP Cameras typically send MJPEG as just that, one JPEG image after another as a stream. This is the most basic valid MJPEG file. If you were to take a bunch of jpegs, cat them together into a large, giant file, and feed it to ffmpeg, it would see it as a valid mjpeg format file. Some cameras will add an additional header to contain audio data, but it is not needed to be considered valid motion jpeg.
Many cameras will include a header like X-Framerate, in the HTTP header when the stream is initially sent, or you can set it as part of the camera configuration. However, when a camera sends only jpegs, there is no way to tell from the stream itself what the framerate is.

Is the lack of any meta information normal for MJPEG or is it just related to the camera I am using? If there a no information about the stream, how can a player know how fast to play the video?
To add to already answered, IP camera is a live video source and frames are typically presented as soon as they arrive from camera. Rare IP camera attaches extra per frame information other than fame size (some don't do even this! they send data and separators only). Still some do attach time stamps and extra data like motion detection state.
Most of the IP cameras don't do constant frame rate. That is, frame rate might vary, esp. lower down in low light conditions. It is the responsibility of the receiving side to attach per frame time stamps when multiplexing the data into container format. Time stamp might be recovered from metadata (which rarely exists) or - more frequently - receiver stamps a frame with local receive time.
This is the way for the player to play back video sequence in proper rate. Live feed is typically presented on "show received frame as soon as possible" basis.

Normally MJPEG data is sent within a streaming media wrapper such as AVI or MOV (quicktime). The wrapper format will contain the framerate and information about the optional audio data.

How do you write audio to the first frame with AVAssetWriter while capturing video/audio on iOS?

Long story short, I am trying to implement a naive solution for streaming video from the iOS camera/microphone to a server.
I am using AVCaptureSession with audio and video AVCaptureOutputs, and then using AVAssetWriter/AVAssetWriterInput to capture video and audio in the captureOutput:didOutputSampleBuffer:fromConnection method and write the resulting video to a file.
To make this a stream, I am using an NSTimer to break the video files into 1 second chunks (by hot-swapping in a different AVAssetWriter that has a different outputURL) and upload these to a server over HTTP.
This is working, but the issue I'm running into is this: the beginning of the .mp4 files appear to always be missing audio in the first frame, so when the video files are concatenated on the server (running ffmpeg) there is a noticeable audio skip at the intersections of these files. The video is just fine - no skipping.
I tried many ways of making sure there were no CMSampleBuffers dropped and checked their timestamps to make sure they were going to the right AVAssetWriter, but to no avail.
Checking the AVCam example with AVCaptureMovieFileOutput and AVCaptureLocation example with AVAssetWriter and it appears the files they generate do the same thing.
Maybe there is something fundamental I am misunderstanding here about the nature of audio/video files, as I'm new to video/audio capture - but thought I'd check before I tried to workaround this by learning to use ffmpeg as some seem to do to fragment the stream (if you have any tips on this, too, let me know!). Thanks in advance!

I had the same problem and solved it by recording audio with a different API, Audio Queue. This seems to solve it, just need to take care of timing in order to avoid sound delay.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart