I have a series of MP4 files (H.264 video, AAC audio, 16KHz). I need to merge them together programmatically (Objective-C, iOS) but the final file will be too large to hold in memory so I can't use the AVFramework to do this for me.
I have written code which will do the merge and takes care of all of the MP4 atoms (STBL, STSZ, STCO etc.) based on just concatenating the contents of the respective MDATS. The problem I have is that while the resultant file plays, the audio gradually gets out of sync with the video. What seems to be happening is that there is a disparity between the audio and video length in each file which gets worse the more files I concatenate.
I've used MP4Box to generate a file from command line and it is 'similar but different' to my output. A notable different is that the length of the MDAT has changed and the chunk offsets have also changed (though sample sizes remain consistent).
I've recently read that AAC encoding introduces padding at the beginning and end of a stream so wonder if this is something I need to handle.
Q: Given two MDAT atoms containing H264 encoded data and AAC audio, is my basic method sound or do I need to introspect the MDAT data in some way.
Thanks for pointer Niels
So it seems that the approach is perfectly reasonable however each individual MP4 file has marginal differences between the audio length and video length due to differences between the sampling frequency. The MP4s include an EDTS.ELST combination which correct this issue for that file. I was failing to consider the EDTS when I merged files. Merging EDTS has fixed the issue.
Related
I am trying to capture image of screen very frequently and then encode it in avcc format. Using the SPS, PPS, and avccnalunits of the encoded frames I am creating fragmented mp4 video and pushing those fragments to cloud, and concat all those fragments to form a big video mp4 file. It works very well until and unless all the fragments have same SPS and PPS, but if there are multipe sets of SPS and PPS in single video its not playable. Please let me know which box should be modified to accommodate this.
You have multiple options here and you should investigate them since the behaviours could differ between players:
Use multiple AVCConfigurationBox entries and then reference the proper sample description index via sample_description_index in the stsc box. I think this should be the most reliable, but then you need to update moov when new SPS/PPS arrive.
There is a thing called parameter set stream (see
5.3.5 AVC parameter set stream definition of ISO 14496-15), each sample in that stream contains a new AVCConfigurationBox. No idea how widely supported this is.
Always put the new SPS/PPS inline. Again, parsers may ignore the inline SPS/PPS and use the ones from the sample description box.
I am manually generated a .mov video file.
Here is a link to an example file: link, I wrote a few image frames, and then after a long break wrote approximately 15 image frames just to emphasise my point for debuting purposes. When I extract images from the video ffmpeg returns around 400 frames instead of the 15-20 I expected. Is this because the API i am using is inserting these image files automatically? Is it a part of the .mov file format that requires this? Or is it due to the way the library is extracting the image frames from the video? I have tried searching the internet but could not arrive at an answer.
My use case is that I am trying to write the current "sensor data" (from core motion) from core motion while writing a video. For each frame I receive from the camera, I use "AppendPixelBuffer" to write the frame to the video and then
Thanks for any help. The end result is I want a 1:1 ratio of Frames in the video to rows in the CSV file. I have confirmed I am writing the CSV file correctly using various counters etc. So my issue is cleariy the understanding of the movie format or API.
Thanks for any help.
UPDATED
It looks like your ffmpeg extractor is wrong. To extract only the timestamped frames (and not frames sampled at 24Hz) in your file, try this:
ffmpeg -i video.mov -r 1/1 image-%03d.jpeg
This gives me the 20 frames expected.
OLD ANSWER
ffprobe reports that your video has a frame rate of 2.19 frames/s and a duration of 17s, which gives 2.19 * 17 = 37 frames, which is closer to your expected 15-20 than ffmpeg's 400.
So maybe the ffmpeg extractor is at fault?
Hard to say if you don't show how you encode and decode the file.
I want to extract a few clips from the recorded wav file. I am not finding much help online regarding this issue. I understand we can't split from compressed formats like mp3, but how do we do it with caf/wav files?
One approach you may consider would be to calculate and read the bytes from an audio file and write them to a new file. Because you are dealing with LPCM formats the calculations are relatively simple.
If for example you have a file of 16bit mono LPCM audio sampled at 44.1kHz that is one minute in duration, then you have a total of (60 secs x 44100Hz) 2,646,000 samples. Times 2 bytes per sample gives a total of 5,292,000 bytes. And if you want audio from 10sec to 30sec then you need to read the bytes from 882,000 to 2,646,000 and write them to a separate file.
There is a bit of code involved but it can be done using Audio File Services Class from the AudioToolbox framework.
Functions you'll need to use are AudioFileOpenURL, AudioFileCreateWithURL, AudioFileReadBytes, AudioFileWriteBytes, and AudioFileClose.
An algorithm would be something like this-
You first set up an AudioFileID which is an opaque type that gets passed in to the AudioFileCreateWithURL function. Then open the file you wish to splice up using AudioFileOpenURL.
Calculate the start and end bytes of what you want to copy.
Next, in a loop preferably, read in the bytes and write them to file. AudioFileReadBytes and AudioFileWriteBytes allow you to do this. Whats good is that you can read and write whatever size bytes you decide on each iteration of the loop.
When finished close the new file and original using AudioFileClose.
Then repeat for each file (audio extraction) to be written.
On an additional note you would split a compressed format by converting the compressed format to LPCM first.
I'm trying to simultaneously read and write H.264 mov file written by AVAssetWriter. I managed to extract individual NAL units, pack them into ffmpeg's AVPackets and write them into another video format using ffmpeg. It works and the resulting file plays well except the playback speed is not right. How do I calculate the correct PTS/DTS values from raw H.264 data? Or maybe there exists some other way to get them?
Here's what I've tried:
Limit capture min/max frame rate to 30 and assume that the output file will be 30 fps. In fact its fps is always less than values that I set. And also, I think the fps is not constant from packet to packet.
Remember each written sample's presentation timestamp and assume that samples map one-to-one to NALUs and apply saved timestamp to output packet. This doesn't work.
Setting PTS to 0 or AV_NOPTS_VALUE. Doesn't work.
From googling about it I understand that raw H.264 data usually doesn't contain any timing info. It can sometimes have some timing info inside SEI, but the files that I use don't have it. On the other hand, there are some applications that do exactly what I'm trying to do, so I suppose it is possible somehow.
You will either have to generate them yourself, or access the Atom's containing timing information in the MP4/MOV container to generate PTS/DTS information. FFmpeg's mov.c in libavformat might help.
Each sample/frame you write with AVAssetWriter will map one to one with the VCL NALs. If all you are doing is converting then have FFmpeg do all the heavy lifting. It will properly maintain the timing information when going from one container format to another.
The bitstream generated by AVAssetWriter does not contain SEI data. It only contains SPS/PPS/I/P frames. The SPS also does not contain VUI or HRD parameters.
-- Edit --
Also, keep in mind that if you are saving PTS information from the CMSampleBufferRef's then the time base may be different from that of the target container. For instance AVFoundation time base is nanoseconds, and a FLV file is milliseconds.
Im using BASS.dll library and all I want to do is to "redirect" part of MP3 Im playing using for example BASS_StreamCreateFile to another file (may be MP3 or WAVe). I dont know how to start? Im trying to use help to find an answer, but still nothing. I can play this stream. Read some data I need. Now I need to copy ile for example from 2:00 to 2:10 (or by position).
Any ideas how should I start?
Regards,
J.K.
Well, I don't know BASS specifically, but I know a little about music playing and compressed data formats in general, and copying the data around properly involves an intermediate decoding step. Here's what you'll need to do:
Open the file and find the correct position.
Decode the audio into an in-memory buffer. The size of your buffer should be (LengthInSeconds * SamplesPerSecond * Channels * BytesPerSample) bytes. So if it's 10 seconds of CD quality audio, that's 10 * 44100 * 2 (stereo) * 2 (16-bit audio) = 1764000 bytes.
Take this buffer of decoded data and feed it into an MP3 encoding function, and save the resulting MP3 to a file.
If BASS has functions for decoding to an external buffer and for encoding a buffer to MP3, you're good; all you have to do is figure out which ones to use. If not, you'll have to find another library for MP3 encoding and decoding.
Also, watch out for generational loss. MP3 uses lossy compression, so if you decompress and recompress the data multiple times, it'll hurt the sound quality.