I have a bare h.264 file (from a raspberry pi camera), and I'd like to wrap it as an mp4. I don't need to play it, edit it, add or remove anything, or access the pixels.
Lots of people have asked about compiling ffmpeg for iOS, or streaming live data. But given the lack of easy translation between the ffmpeg command line and its iOS build, it's very difficult for me to figure out how to implement this simple command:
ffmpeg -i input.h264 -vcodec copy out.mp4
I don't specifically care whether this happens via ffmpeg, avconv, or AVFoundation (or something else). It just seems like it should be not-this-hard to do on a device.
It is not hard but requires some work and attention to detail.
Here is my best guess:
read PPS/SPS from your input.h264
extract height & width from SPS
generate avcC header from PPS/SPS
create an AVAssetWriter with file type AVFileTypeQuickTimeMovie
create an AVAssetWriterInput
add the AVAssetWriterInput as AVMediaTypeVideo with your height & width to the AVAssetWriter
read from your input.h264 (likely in Annex B format) one NALs at a time
convert your NALs from your input.h264 from start code prefixed (0 0 1; Annex B) to size prefixed (mp4 format)
drop NALs of type AU, PPS, SPS
create a CMSampleBuffer for each NAL and add a CMFormatDescription with the avcC header
regenerate timestamps starting a zero using the known frame rate (watch out if your frames are reordered)
append your CMSampleBuffer to your AVAssetWriterInput
goto 7 until EOF
Related
I have been using the Superpowered iOS library to analyse audio and extract BPM, loudness, pitch data. I'm working on an iOS Swift 3.0 project and have been able to get the C classes work with Swift using the Bridging headers for ObjC.
The problem I am running into is that whilst I can create a decoder object, extract audio from the Music Library and store it as a .WAV - I am unable to create a decoder object for just snippets of the extracted audio and get the analyser class to return data.
My approach has been to create a decoder object as follows:
var decodeAttempt = decoder!.open(self.originalFilePath, metaOnly: false, offset: offsetBytes, length: lengthBytes, stemsIndex: 0)
'offsetBytes' and 'LengthBytes' I think are the position within the audio file. As I have already decompressed audio, stored it as WAV and then am providing it to the decoder here, I am calculating the offset and length using the PCM Wave audio formula of 44100 x 2 x 16 / 8 = 176400 bytes per second. Then using this to specify a start point and length in bytes. I'm not sure that this is the correct way to do this as the decoder will return 'Unknown file format'.
Any ideas or even alternative suggestions of how to achieve the title of this question? Thanks in advance!
The offset and length parameters of the SuperpoweredDecoder are there because of the Android APK file format, where bundled audio files are simply concatenated to the package.
Despite a WAV file is as "uncompressed" as it can be, there is a header at the beginning, so offset and length are not a good way for this purpose. Especially as the header is present at the beginning only, and without the header decoding is not possible.
You mention that you can extract audio to PCM (and save to WAV). Then you have the answer in your hand: just submit different extracted portions to different instances of the SuperpoweredOfflineAnalyzer.
I am manually generated a .mov video file.
Here is a link to an example file: link, I wrote a few image frames, and then after a long break wrote approximately 15 image frames just to emphasise my point for debuting purposes. When I extract images from the video ffmpeg returns around 400 frames instead of the 15-20 I expected. Is this because the API i am using is inserting these image files automatically? Is it a part of the .mov file format that requires this? Or is it due to the way the library is extracting the image frames from the video? I have tried searching the internet but could not arrive at an answer.
My use case is that I am trying to write the current "sensor data" (from core motion) from core motion while writing a video. For each frame I receive from the camera, I use "AppendPixelBuffer" to write the frame to the video and then
Thanks for any help. The end result is I want a 1:1 ratio of Frames in the video to rows in the CSV file. I have confirmed I am writing the CSV file correctly using various counters etc. So my issue is cleariy the understanding of the movie format or API.
Thanks for any help.
UPDATED
It looks like your ffmpeg extractor is wrong. To extract only the timestamped frames (and not frames sampled at 24Hz) in your file, try this:
ffmpeg -i video.mov -r 1/1 image-%03d.jpeg
This gives me the 20 frames expected.
OLD ANSWER
ffprobe reports that your video has a frame rate of 2.19 frames/s and a duration of 17s, which gives 2.19 * 17 = 37 frames, which is closer to your expected 15-20 than ffmpeg's 400.
So maybe the ffmpeg extractor is at fault?
Hard to say if you don't show how you encode and decode the file.
I have a series of MP4 files (H.264 video, AAC audio, 16KHz). I need to merge them together programmatically (Objective-C, iOS) but the final file will be too large to hold in memory so I can't use the AVFramework to do this for me.
I have written code which will do the merge and takes care of all of the MP4 atoms (STBL, STSZ, STCO etc.) based on just concatenating the contents of the respective MDATS. The problem I have is that while the resultant file plays, the audio gradually gets out of sync with the video. What seems to be happening is that there is a disparity between the audio and video length in each file which gets worse the more files I concatenate.
I've used MP4Box to generate a file from command line and it is 'similar but different' to my output. A notable different is that the length of the MDAT has changed and the chunk offsets have also changed (though sample sizes remain consistent).
I've recently read that AAC encoding introduces padding at the beginning and end of a stream so wonder if this is something I need to handle.
Q: Given two MDAT atoms containing H264 encoded data and AAC audio, is my basic method sound or do I need to introspect the MDAT data in some way.
Thanks for pointer Niels
So it seems that the approach is perfectly reasonable however each individual MP4 file has marginal differences between the audio length and video length due to differences between the sampling frequency. The MP4s include an EDTS.ELST combination which correct this issue for that file. I was failing to consider the EDTS when I merged files. Merging EDTS has fixed the issue.
I'm trying to simultaneously read and write H.264 mov file written by AVAssetWriter. I managed to extract individual NAL units, pack them into ffmpeg's AVPackets and write them into another video format using ffmpeg. It works and the resulting file plays well except the playback speed is not right. How do I calculate the correct PTS/DTS values from raw H.264 data? Or maybe there exists some other way to get them?
Here's what I've tried:
Limit capture min/max frame rate to 30 and assume that the output file will be 30 fps. In fact its fps is always less than values that I set. And also, I think the fps is not constant from packet to packet.
Remember each written sample's presentation timestamp and assume that samples map one-to-one to NALUs and apply saved timestamp to output packet. This doesn't work.
Setting PTS to 0 or AV_NOPTS_VALUE. Doesn't work.
From googling about it I understand that raw H.264 data usually doesn't contain any timing info. It can sometimes have some timing info inside SEI, but the files that I use don't have it. On the other hand, there are some applications that do exactly what I'm trying to do, so I suppose it is possible somehow.
You will either have to generate them yourself, or access the Atom's containing timing information in the MP4/MOV container to generate PTS/DTS information. FFmpeg's mov.c in libavformat might help.
Each sample/frame you write with AVAssetWriter will map one to one with the VCL NALs. If all you are doing is converting then have FFmpeg do all the heavy lifting. It will properly maintain the timing information when going from one container format to another.
The bitstream generated by AVAssetWriter does not contain SEI data. It only contains SPS/PPS/I/P frames. The SPS also does not contain VUI or HRD parameters.
-- Edit --
Also, keep in mind that if you are saving PTS information from the CMSampleBufferRef's then the time base may be different from that of the target container. For instance AVFoundation time base is nanoseconds, and a FLV file is milliseconds.
Im using BASS.dll library and all I want to do is to "redirect" part of MP3 Im playing using for example BASS_StreamCreateFile to another file (may be MP3 or WAVe). I dont know how to start? Im trying to use help to find an answer, but still nothing. I can play this stream. Read some data I need. Now I need to copy ile for example from 2:00 to 2:10 (or by position).
Any ideas how should I start?
Regards,
J.K.
Well, I don't know BASS specifically, but I know a little about music playing and compressed data formats in general, and copying the data around properly involves an intermediate decoding step. Here's what you'll need to do:
Open the file and find the correct position.
Decode the audio into an in-memory buffer. The size of your buffer should be (LengthInSeconds * SamplesPerSecond * Channels * BytesPerSample) bytes. So if it's 10 seconds of CD quality audio, that's 10 * 44100 * 2 (stereo) * 2 (16-bit audio) = 1764000 bytes.
Take this buffer of decoded data and feed it into an MP3 encoding function, and save the resulting MP3 to a file.
If BASS has functions for decoding to an external buffer and for encoding a buffer to MP3, you're good; all you have to do is figure out which ones to use. If not, you'll have to find another library for MP3 encoding and decoding.
Also, watch out for generational loss. MP3 uses lossy compression, so if you decompress and recompress the data multiple times, it'll hurt the sound quality.