Append video frames to mp4 while playing - ios

I need to append video frames taken from stream to existing mp4 file.
I found that I need to use mdat and stco atom to update chunks table.
Is it posible to append new frames/chunks at all? I need to do this in ios app.

Such appending is possible, but it may not be practical -- especially if you are wanting to add one frame at a time.
MP4 is very much a "baked" format that is difficult to modify after it has been generated. This is because the structural data in the "moov" box contains file offsets to important media data in the "mdat" box. If the moov box is positioned before the mdat box (common to allow progressive download), then any data added to the moov box (i.e. references to new keyframes in your appended video) will push the mdat box further away. Not only would you have to rewrite the file, but you'd need to update all the file offsets accordingly. (Perhaps there are some clever tricks for keeping the moov box size constant...) If the mdat box is positioned first, the operation would still be awkward because you'd need to copy the moov box into memory, append your new video to the end of the mdat, update moov fields accordingly, and append the new moov box.
If you can gather all new video frames, and at the end of the recording add them during a rewrite operation, this may be workable. You could also look into Fragmented MP4 (using "moof" boxes), but I'm not sure how widespread the support for reading such files is.

Related

How do you make Media Source work with timestampOffset lower than appendWindowStart?

I want to use appendBuffer and append only piece of the media I have.
To cut the piece from the end, I use appendWindowEnd and it works.
To cut it from the beginning I have to set timestampOffset lower than appendWindowStart. I have seen shaka-player doing something similar.
var appendWindowStart = Math.max(0, currentPeriod.startTime - windowFudge);
var appendWindowEnd = followingPeriod ? followingPeriod.startTime : duration;
...
var timestampOffset = currentPeriod.startTime -mediaState.stream.presentationTimeOffset;
From my tests, it works when timestampOffset is
same as appendWindowStart
1/10 second lower
Does't work when timestampOffset is lower than that. The segment doesn't get added. Does that have something to do with my media or the spec/implementation doesn't allow it?
From MDN web docs:
The appendWindowStart property of the SourceBuffer interface controls the timestamp for the start of the append window, a timestamp range that can be used to filter what media data is appended to the SourceBuffer. Coded media frames with timestamps within this range will be appended, whereas those outside the range will be filtered out.
Just found this in the specification, so I am updating the question:
If presentation timestamp is less than appendWindowStart, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
Some implementations may choose to collect some of these coded frames with presentation timestamp less than appendWindowStart and use them to generate a splice at the first coded frame that has a presentation timestamp greater than or equal to appendWindowStart even if that frame is not a random access point. Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement.
If frame end timestamp is greater than appendWindowEnd, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
Some implementations may choose to collect coded frames with presentation timestamp less than appendWindowEnd and frame end timestamp greater than appendWindowEnd and use them to generate a splice across the portion of the collected coded frames within the append window at time of collection, and the beginning portion of later processed frames which only partially overlap the end of the collected coded frames. Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement. In conjunction with collecting coded frames that span appendWindowStart, implementations may thus support gapless audio splicing.
If the need random access point flag on track buffer equals true, then run the following steps:
If the coded frame is not a random access point, then drop the coded frame and jump to the top of the loop to start processing the next coded frame.
Set the need random access point flag on track buffer to false.
and
Random Access Point
A position in a media segment where decoding and continuous playback can begin without relying on any previous data in the segment. For video this tends to be the location of I-frames. In the case of audio, most audio frames can be treated as a random access point. Since video tracks tend to have a more sparse distribution of random access points, the location of these points are usually considered the random access points for multiplexed streams.
Does that mean, that for a video, I have to choose timeOffset, which lands on 'I' frame?
The use of timestampOffset doesn't require an I-Frame. It just shifts the timestamp of each frame by that value. That shift calculations is performed before anything else (before appendWindowStart getting involved)
It's the use of appendWindowStart that are impacted to where your I-frames are.
appendWindowStart and appendWindowEnd act as an AND over the data you're adding.
MSE doesn't reprocess your data, by setting appendWindowStart you're telling the source buffer that any data contained prior that time are to be excluded
Also MSE works at the fundamental level of GOP (group of picture): from one I-Frame to another.
So let's imagine this group of images, made of 16 frames GOP, each having a duration of 1s.
.IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
Say now you set appendWindowStart to 10
In the ideal world you would have:
. PPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
All previous 9 frames with a time starting prior appendWindowStart have been dropped.
However, now those P-Frames can't be decoded, hence MSE set in the spec the "need random access point flag" to true, so the next frame added to the source buffer can only be an I-Frame
and so you end up in your source buffer with:
. IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
To be able to add the frames between appendWindowStart and the next I-Frame would be incredibly hard and time expensive.
It would require to decode all frames before adding them to the source buffer, storing them either as raw YUV data, or if hardware accelerated storing the GPU backed image.
A source buffer could contain over a minute of video at any given time. Imagine if it had to deal with decompressed data now rather than compressed one.
Now, if we wanted to preserve the same memory constraint as now (which is around 100MiB of data maximum per source buffer), you would have to recompress on the fly the content before adding it to the source buffer.
not gonna happen.

What's the best way to composite frame-based animated stickers over recorded video?

We want to allow the user to place animated "stickers" over video that they record in the app and are considering different ways to composite these stickers.
Create a video in code from the frame-based animated stickers (which can be rotated, and have translations applied to them) using AVAssetWriter. The problem is that AVAssetWriter only writes to a file and doesn't keep transparency. This would prevent us from being able to overly it over the video using AVMutableComposition.
Create .mov files ahead of time for our frame based stickers and composite them using AVMutableComposition and layer instructions with transformations. The problem with this is that there are no tools for easily converting our PNG based frames to a .mov while maintaining an alpha channel and we'd have to write our own.
Creating separate CALayers for each frame in the sticker animations. This could potentially create a very large number of layers per frame rate of the video.
Or any better ideas?
Thanks.
I would suggest that you take a look at my blog post on this specific subject. Basically, this example shows how RGBA video data can be loaded from a file attached to the app resources. This is imported from a .mov that contains Animation RGBA data on the desktop. A conversion step is required to get the data from the Desktop into iOS, since plain H.264 cannot support an Alpha channel directly (as you have discovered). Note that older hardware may have issues decoding a H.264 user recorded video and then another one on top of that, so this approach of using the CPU instead of the H.264 hardware for the sticker is actually better.

Set interlacing information in QuickTime

I'm trying to set the correct interlacing information via the QuickTime 7 API on a movie that I am creating.
I want to make my movie progressive scan but when I visually check the output, every frame is squashed into the top half. So even though I make sure QuickTime knows my movie is kQTFieldsProgressiveScan it still gets confused.
This is what I am doing:
myCreateNewMovie(...);
ICMCompressionSessionOptionsCreate(...);
BeginMediaEdits(media);
myCreate(ImageDescription with appropriate FieldInfoImageDescriptionExtension2);
SetMediaSampleDescription(media, ImageDescription);
and then when writing each frame I add the same description:
ICMImageDescriptionSetProperty(myFieldInfoImageDescription, ...);
AddMediaSample2(...);
From various bits and pieces on the net I got the impression that setting the sample description for the media was getting overwritten. Now I'm setting the FieldInfo data inside my ICM Encoded Frame Output callback and it seems to be satisfactory.

Creating a Motion JPEG frame by frame with variable frame-rate

I'm analyzing a number of solutions to the problem that I have in hand: I'm receiving images from a device and I need to make a video file out of it. However, the images arrive with a somewhat random delay between them and I'm looking for the best way to encode this. I have to create this video frame by frame, and after each frame I must have a new video file with the new frame, replacing the old video file.
I was thinking of fixating the frame-rate a little "faster" than the minimum delay that I might get and just repeat the last frame until a new one arrives, but I guess that this solution is not optimal.
Also, this project is made with Delphi (no, I cannot change that) and I need means to turn these frames into a video file after each frame. I was thinking about using mencoder as an external tool, but I'm reading the documentation and still haven't found an option to make it insert a frame in an already encoded Motion JPEG video file. As my images come in as JPEG, I thought that it would be reasonable to use Motion JPEG, but not even this is certain yet. Also, I don't know if mencoder can be used as a library. It would help a lot if it did.
What would you suggest?
There are some media container formats that support variable frame rate, but I don't think MJPEG is good choice because of the storage overhead. I believe the best way would be to transcode JPEG frames to MP4 format using both I-frames and P-frames.
You can use FFMPEG Delphi/FP header files for the transcoding.
Edit:
The most up to date version of FFMPEG headers can be found at GLScene repository on SourceForge.net. To view the files you can use this link

How to find the source video size using VMR9 renderless mode

My application uses VMR9 Renderless mode to play a WMV file. I build a filter graph with IGraphBuilder::RenderFile and control playback with IMediaControl. Everything plays okay, but I can't figure out how to determine the source video size. Any ideas?
Note: This question was asked before in How can I adjust the video to a specified size in VMR9 renderless mode?. But the solution was to use Windowless mode instead of Renderless mode, which would require rewriting my code.
Firstly you want the Video renderer. You can do this by using EnumFilters on the IGraphBuilder interface. Then call EnumPins on that filter to find the input pin. You can then call ConnectionMediaType to get the media type being fed into that filter. Now depending what formattype is set to you can cast the pbFormat pointer to the relevant structure and from there find out what the video size is. If you want the size before that (to see if some scaling is going on) you can work your way back across the pin using "ConnectedTo" to get the next filter back. You can then find its input pins and repeat the ConnectionMediaType call. Repeat until you get to the filter's pin that you want.
You could use the MediaInfo project at http://mediainfo.sourceforge.net/hr/Download/Windows and through the CS wrapper included in the VCS2010 or VCS2008 folders get all the information about a video you need.
EDIT: Sorry I thought you were on managed. But in either case the MediaInfo can be used, so maybe it helps.

Resources