TL;DR
I want to convert fMP4 fragments to TS segments (for HLS) as the fragments are being written using FFmpeg on an iOS device.
Why?
I'm trying to achieve live uploading on iOS while maintaining a seamless, HD copy locally.
What I've tried
Rolling AVAssetWriters where each writes for 8 seconds, then concatenating the MP4s together via FFmpeg.
What went wrong - There are blips in the audio and video at times. I've identified 3 reasons for this.
1) Priming frames for audio written by the AAC encoder creating gaps.
2) Since video frames are 33.33ms long, and audio frames 0.022ms long, it's possible for them to not line up at the end of a file.
3) The lack of frame accurate encoding present on Mac OS, but not available for iOS Details Here
FFmpeg muxing a large video only MP4 file with raw audio into TS segments. The work was based on the Kickflip SDK
What Went Wrong - Every once in a while an audio only file would get uploaded, with no video whatsoever. Never able to reproduce it in-house, but it was pretty upsetting to our users when they didn't record what they thought they did. There were also issues with accurate seeking on the final segments, almost like the TS segments were incorrectly time stamped.
What I'm thinking now
Apple was pushing fMP4 at WWDC this year (2016) and I hadn't looked into it much at all before that. Since an fMP4 file can be read, and played while it's being written, I thought that it would be possible for FFmpeg to transcode the file as it's being written as well, as long as we hold off sending the bytes to FFmpeg until each fragment within the file is finished.
However, I'm not familiar enough with the FFmpeg C API, I only used it briefly within attempt #2.
What I need from you
Is this a feasible solution? Is anybody familiar enough with fMP4 to know if I can actually accomplish this?
How will I know that AVFoundation has finished writing a fragment within the file so that I can pipe it into FFmpeg?
How can I take data from a file on disk, chunk at a time, pass it into FFmpeg and have it spit out TS segments?
Strictly speaking you don't need to transcode the fmp4 if it contains h264+aac, you just need to repackage the sample data as TS. (using ffmpeg -codec copy or gpac)
Wrt. alignment (1.2) I suppose this all depends on your encoder settings (frame rate, sample rate and GOP size). It is certainly possible to make sure that audio and video align exactly at fragment boundaries (see for example: this table). If you're targeting iOS, I would recommend using HLS protocol version 3 (or 4) allowing timing to be represented more accurately. This also allows you to stream audio and video separately (non-multiplexed).
I believe ffmpeg should be capable of pushing a live fmp4 stream (ie. using a long-running HTTP POST), but playout requires origin software to do something meaningful with it (ie. stream to HLS).
Related
So I have put together a sample project https://github.com/liuxuan30/TestH264.git that uses VideoToolBox to have a H264 sample decoder to display a stream file, captured from a camera.
The H264 decoder using VideoToolBox is copied from internet, I didn't write it, when I tried to play my h264 stream file, it plays too fast, comparing to ffmpeg or ffplay, which both played back at a normal speed.
I wanted to ask, how to fix this behaviour? Thanks.
This happens because of this constant kCMSampleAttachmentKey_DisplayImmediately:
If this key is present, the sample should be displayed as soon as possible rather than
according to its presentation timestamp. Use this attachment at run time to request this
behavior from a display pipeline such as the AVSampleBufferDisplayLayer class.
This attachment is not written to media files.
from Apple documation
So you have two options of displaying:
Display immediately - which is probably good for real-time stream, when you need to display frame as soon as possible
Display frames at specific timestamp
*comparing to ffmpeg or ffplay, which both played back at a normal speed.
ffplay and ffmpeg probably use timestamp at this point.
I have same result as you from your test H.264 file, but it's happens because you get all decoded frame at once so decoder is displaying it immediately.
You can watch this video for more information about VideoToolbox framework:
Direct Access to Video Encoding and Decoding
I am trying to write a live video broadcaster over RTSP from an ios device. I am utilizing AVAssetWriter so I can take advantage of hardware encoding. To send over RTSP I have to get the avcC information out of the MOOV block, however the MOOV block is only written from AVAssetWriter when you have finished the session, which of course is not finished as I am streaming this live.
I have gotten around this with the video by encoding, writing, and then finishing a single sample buffer to file, and the parsing the file to get the avcC information out. That works just fine.
After that for the live stream, since AVAssetWriter will only write to a file, I am writing it out to file and then reading from that file with a chasing file offset. When I do this with video only, I can read the Nalu's from the MDAT Atom in the written file without any MOOV information as the size of each Nalu is given in the first 4 bytes of the Nalu. So I can read that amount, process it, and send it on its way over an RTSP stream. So with video only, everything works perfectly fine and I get real good HD stream to a stream server.
The problem I am now having is when I try to incorporate audio into the stream from the mic. I can encode it just fine with AVAssetWriter and I get proper interleaved formated mp4 file to read from, however unlike the H264 Nalu's, the audio samples in the file do not have the size of the sample as their first byte. So far the only way I can see to define that is with the STSZ and STCO Atoms in the MOOV, which of course I dont have because it is a live stream.
With all that in mind, does any one know a way to identify audio sample segments in an MDAT Atom without the information from the MOOV Atom? As soon as I figure that out, Im home free.
Thanks in advance for any insight.
After a lot of research and emails out to people, I at least have an answer, and the answer is, I cant do it this way. Normally AAC samples in streams where dont have an index is wrapped in ADTS headers which holds the length field for the packet. However, since I am using AVAssetWriter for the audio, and AVAssetWriter writes directly to an MP4 file, the ADTS wrap is stripped off because of the index that will be in the MOOV Atom.
Therefore I will have to encode the audio differently, probably through Audio Queue services and meld it into the Video packets when applying to the RTSP stream.
Maybe this will help someone else in the future looking down this same road.
Many thanks to Geraint Davies at http://www.gdcl.co.uk for leading me down the right path.
I've got experience with building iOS apps but don't have experience with video. I want to build an iPhone app that streams real time video to a server. Once on the server I will deliver that video to consumers in real time.
I've read quite a bit of material. Can someone let me know if the following is correct and fill in the blanks for me.
To record video on the iPhone I should use the AVFoundation classes. When using the AVCaptureSession the delegate method captureOutput:didOutputSampleBuffer::fromConnection I can get access to each frame of video. Now that I have the video frame I need to encode the frame
I know that the Foundation classes only offer H264 encoding via AVAssetWriter and not via a class that easily supports streaming to a web server. Therefore, I am left with writing the video to a file.
I've read other posts that say they can use two AssetWritters to write 10 second blocks then NSStream those 10 second blocks to the server. Can someone explain how to code the use of two AVAssetWriters working together to achieve this. If anyone has code could they please share.
You are correct that the only way to use the hardware encoders on the iPhone is by using the AVAssetWriter class to write the encoded video to a file. Unfortunately the AVAssetWriter does not write the moov atom to the file (which is required to decode the encoded video) until the file is closed.
Thus one way to stream the encoded video to a server would be to write 10 second blocks of video to a file, close it, and send that file to the server. I have read that this method can be used with no gaps in playback caused by the closing and opening of files, though I have not attempted this myself.
I found another way to stream video here.
This example opens 2 AVAssetWriters. Then on the first frame it writes to two files but immediately closes one of the files so the moov atom gets written. Then with the moov atom data the second file can be used as a pipe to get a stream of encoded video data. This example only works for sending video data but it is very clean and easy to understand code that helped me figure out how to deal with many issues with video on the iPhone.
I would like to write an iphone app that continuously capture video, h.264 encode them in 10 seconds interval and upload to a storage server. This can be done with avassetwriter, and I can keep on deleting the old files as I create new ones. However, as flash memory have a limited write cycles, this scheme will destroy the flash after a few thousand write cycles through the flash. Is there a way to redirect avassetwriter to memory, or create a ram drive on the iphone?
Thanks!
Yes avassetwriter is the only way to get to the hardware decoder. and simply reading back the file while its written doesn't give you the moov atoms so avfoundation or mpmediaplayer based players won't be able to read it back. you only have a couple choices , periodically stop the asassetwriter and write to the file on a background thread, effectively segmenting your movie into smaller complete files. or you could deal with the incomplete mp4 on the server side, you will have to decode the raw nalu's and recreate the missing moov atoms. If your using ffmpeg mov.c is source to look at. This is also were an incomplete mp4 file would fail.
As of Flash 10.1, they have added the ability to add bytes into the NetStream object via the appendBytes method (described here http://www.bytearray.org/?p=1689). The main reason for this addition is that Adobe is finally supporting HTTP streaming of video. This is great, but it seems that you need to use the Adobe Media Streaming Server (http://www.adobe.com/products/httpdynamicstreaming/) to create the correct video chunks from your existing video to allow for smooth streaming.
I have tried to do a hacked version of HTTP streaming in the past where I swap out the NetStream objects (similar to here http://video.leizhu.com/video.html), but there is always a momentary pause between the chunks. With the new appendBytes, I tried to do a quick mock up with the two sections of video from the preceding site, but even then, the skip still remains.
Does anyone know how the two consecutive .FLV files needs to be formated in order for the appendBytes method on the NetStream object to create a nice smooth video without a noticeable skip between the segments?
I was able to get this working using Adobe's File Packager Tool which Samuel described. I didn't use the NetStream object but I used the OSMF Sample Player which I assume uses this internally. Here's how to do with without using FMS:
Get Adobe's File Packager for Http Dynamic Streaming from http://www.adobe.com/products/httpdynamicstreaming/
Run the File Packager on an existing MP4 file containing H.264/AAC like this:
C:\Program Files\Adobe\Flash Media Server 4\tools\f4fpackager>
f4fpackager.exe --input-file="MyFile.mp4" --segment-duration=30
This will result in 30 second long F4F files, also F4X and a F4M file. The F4F files are your correctly segmented (and fragmented) MP4 files that should play.
If you want to test this using the OSMF Player also do the following:
Get Apache Server
Get Adobe's Http Origin Module for Apache from http://www.adobe.com/products/httpdynamicstreaming/
Install the module according to http://help.adobe.com/en_US/HTTPStreaming/1.0/Using/WS8d6ed60bd880807c48597a9e1265edd6cc0-8000.html
Put the F4F, F4X and F4M file into the vod directory under httpdocs
Get the “OSMF Sample Player for HTTP Dynamic Streaming” from http://www.osmf.org/downloads/OSFMPlayer_zeri2.zip
Put the Sample Player in the httpdocs directory
Load the html file from the Sample Player in a browser eg http://localhost/OSMFPlayer.html
Press the eject button and put in the URL of your F4M file, it should play
So to answer the original question Adobe's File Packager is the file splitter to use, you don't need to buy FMS to use it and it works for FLV and MP4/F4V files.
You don't need to use their server. Wowza supports Adobe's version of HTTP Streaming and you can implement it yourself by segmenting the videos properly and loading all the segments on a standard HTTP server.
Links to all the specs for Adobe's HTTP Streaming are here:
http://help.adobe.com/en_US/HTTPStreaming/1.0/Using/WS9463dbe8dbe45c4c-1ae425bf126054c4d3f-7fff.html
Trying to hack the client to do some custom style http streaming will be a lot more troublesome.
Note that HTTP Streaming does not support streaming several different videos but streams a single file that was broken off into separate segments.
File Packager
A command-line tool that translates on-demand media files into fragments and writes the fragments to F4F files. The File Packager is an offline tool. You can use the File Packager to encrypt files for use with Flash Access. For more information, see Packaging on-demand media.
The File Packager is available from adobe.com and is installed with Adobe® Flash® Media Server to the rootinstall/tools/f4fpackager folder.
Packager download link is on right here: Download File Packager for HTTP Dynamic Streaming
http://www.adobe.com/products/httpdynamicstreaming/
You could use F4Pack, it's a GUI around the commandline-tool from Adobe, that lets you process your flv/f4v file so they can be used for HTTP Dynamic Streaming.
The place in the OSMF code where this happens is the timer-fired state machine inside of the HTTPNetStream class implementation... might be an informative read. I think I even put some helpful comments in there when I wrote it.
As far as the general question:
If you read an entire FLV file into a ByteArray and pass it to appendBytes, it will play. If you break that FLV file in half, and pass the first half as a byte array and then the second half as a byte array, that will play as well.
If you want to be able to switch around between bitrates without a gap, you need to split up your FLV files at matching keyframe points... and remember that only the first call to appendBytes has the initial FLV file header ('F', 'L', 'V', flags, offset)... the rest just expect a continuation of the FLV byte sequence.
I recently found a similar project for node.js to achieve m3u8 transcoding (https://github.com/andrewschaaf/media-server) but have yet to hear of one besides Wowza doing it outside of Origin module for Apache. Since the payloads are nearly identical you're better off looking for a good mp4 segmenting solution (plenty out there) than looking for f4m segmenting. The problem is moov atoms especially on larger mp4 video are difficult to manage and put in their proper initial (near beginning of file) location. Even using optimal ffmpeg settings and 'qtfaststart' you end up with noticeably slower seeking, inefficient bandwidth usage (usually greedy), and a few minor headaches relating to scrubbing/time that you don't get with flv/f4v playback.
In my player I have or intend to switch between HTTP Dynamic Streaming (HDS) and MP4 based on load and realtime log parsing Apache using awk/cron instead of licensing Adobe's Access product for stream protection .. both have unique 'onmetadata' handlers.. but in the end I receive sequenced time/byte hashes virtually equivalent. Just MP4 is slower. So mod_origin is just a synchronizer / request router for Flash clients (over http). I'm still looking for ways to speed up mp4-container-based playback. One incredible solution I read this recently and was rather awestruck by it http://zehfernando.com/2011/flash-video-frame-time-woes/ where a video editor (guy) and flash developer came up with their own mp4 timecoding solution that literally added (via Adobe Premiere script) about 50 pixels to the bottom of every video frame with a visual 'binary' stamp like a frame barcode.. and those binary values translate into highly-accurate timecode values. So Flash could analyze the video frames as they were painted (realtime) and determine precisely where the player was and what bytes were needed from any kind of mp4 byte-segmenting-friendly webserver. The thing is (and perhaps I'm wrong here) Flash seems to arbitrarily choose when it gets to moov data, especially on large video files (.5-1.5gigs). Even if you make sure to run your mp4 through MP4Box (i.e. MP4Box -frag 10000 -inter 0 movie.mp4) I guess this has been a problem OSMF and HDS have worked on quite well
now, though it is annoying that you need Apache and a proprietary closed-source module to use it imo. Its probably just a matter of time before open source implementations arrive as HDS is only 1-2 years old, and it just needs a little reverse engineering like that Andrew Chaaf guy with node.js + mpegts streaming (live or not).
In the end I may just end up using OSMF exclusively beneath my UI as it seems to have similar virtues to HDS if not more so i.e. Strobe if you need sick extensible HDS or MP4 open player platform to hack from to realize your own custom player.
Adobe's F4F format is based on MP4 files, are you able to use F4V or MP4 instead of FLV files?
There are plenty of MP4 file splitters around but you would need to make sure the timestamps in the files are continuous, maybe the pause happens when it sees a zero timestamp within the audio or video stream inside the file.