Does OpenCV 3.0 Still Has Limits On VideoWriter Size? - opencv

OpenCV 2.4 VideoWriter couldn't save video files larger than 2GB, since it only accepts .avi files, I am wondering if this is still the case in OpenCV 3.0, or if it can save other kind of video files that doesn't have this limitations.
I tried to find any documentations pointing to a limit of 2GB or a release note saying it's capable to handle larger files, but I can't find none.

Even though the OpenCV 3.0-beta documentation states otherwise, OpenCV 3.0's VideoWriter seems to handle other file formats, such as mkv, as shown in this issue.
I adapted the code from the above issue to generate a 4GB mkv video (4096 frames of random 2048x2048).
The things to be aware is that the image size should be passed as width then height in the VideoWriter whereas the numpy array should be initialized with height then width. VideoWriter will fail silently otherwise.
You will also require a recent OpenCV 3.0 source to handle uncompressed streams.

This is not OpenCV limitation. AVI file size cannot be larger than 2 GB due to format limitations (4-byte size signed integer has max value 2,147,483,647).
Is it possible to pack video in another container with OpenCV (mkv etc)?
the RIFF header has the following form:
'RIFF' fileSize fileType (data)
where 'RIFF' is the literal FOURCC code 'RIFF',
fileSize is a 4-byte value giving the size of the data in the file,
and fileType is a FOURCC that identifies the specific file type.

Related

In OpenCV many conversions to JPG using imEncode fails

For a specific purpose I am trying to convert an AVI video to a kind of Moving JPEG format using OpenCV. In order to do so I read images from the source video, convert them to JPEG using imEncode, and write these JPEG images to the target video.
After several hundreds of frames suddenly the size of the resulting JPEG image nearly doubles. Here's a list of sizes:
68045
68145
68139
67885
67521
67461
67537
67420
67578
67573
67577
67635
67700
67751
127800
127899
127508
127302
126990
126904
Anybody got a clue what's going on here?
By the way: I'm using OpenCV.Net as a wrapper for OpenCV.
Thanks a lot in advance,
Paul
I found the solution. If I explicitly enter the third parameter to imEncode (for JPEG encoding this indicates the quality of the encoding, ranging from 0 to 100) instead of using the default (95) the problem disappears. It's likely this is a bug in OpenCV.Net, but it could also be a bug in OpenCV itself.

How to wrap an h.264 file as mp4 on iOS

I have a bare h.264 file (from a raspberry pi camera), and I'd like to wrap it as an mp4. I don't need to play it, edit it, add or remove anything, or access the pixels.
Lots of people have asked about compiling ffmpeg for iOS, or streaming live data. But given the lack of easy translation between the ffmpeg command line and its iOS build, it's very difficult for me to figure out how to implement this simple command:
ffmpeg -i input.h264 -vcodec copy out.mp4
I don't specifically care whether this happens via ffmpeg, avconv, or AVFoundation (or something else). It just seems like it should be not-this-hard to do on a device.
It is not hard but requires some work and attention to detail.
Here is my best guess:
read PPS/SPS from your input.h264
extract height & width from SPS
generate avcC header from PPS/SPS
create an AVAssetWriter with file type AVFileTypeQuickTimeMovie
create an AVAssetWriterInput
add the AVAssetWriterInput as AVMediaTypeVideo with your height & width to the AVAssetWriter
read from your input.h264 (likely in Annex B format) one NALs at a time
convert your NALs from your input.h264 from start code prefixed (0 0 1; Annex B) to size prefixed (mp4 format)
drop NALs of type AU, PPS, SPS
create a CMSampleBuffer for each NAL and add a CMFormatDescription with the avcC header
regenerate timestamps starting a zero using the known frame rate (watch out if your frames are reordered)
append your CMSampleBuffer to your AVAssetWriterInput
goto 7 until EOF

Merge MDAT atoms of MP4 files

I have a series of MP4 files (H.264 video, AAC audio, 16KHz). I need to merge them together programmatically (Objective-C, iOS) but the final file will be too large to hold in memory so I can't use the AVFramework to do this for me.
I have written code which will do the merge and takes care of all of the MP4 atoms (STBL, STSZ, STCO etc.) based on just concatenating the contents of the respective MDATS. The problem I have is that while the resultant file plays, the audio gradually gets out of sync with the video. What seems to be happening is that there is a disparity between the audio and video length in each file which gets worse the more files I concatenate.
I've used MP4Box to generate a file from command line and it is 'similar but different' to my output. A notable different is that the length of the MDAT has changed and the chunk offsets have also changed (though sample sizes remain consistent).
I've recently read that AAC encoding introduces padding at the beginning and end of a stream so wonder if this is something I need to handle.
Q: Given two MDAT atoms containing H264 encoded data and AAC audio, is my basic method sound or do I need to introspect the MDAT data in some way.
Thanks for pointer Niels
So it seems that the approach is perfectly reasonable however each individual MP4 file has marginal differences between the audio length and video length due to differences between the sampling frequency. The MP4s include an EDTS.ELST combination which correct this issue for that file. I was failing to consider the EDTS when I merged files. Merging EDTS has fixed the issue.

Generating a 16-bits per channel PNG file procedurally

Is there any way to generate a 16-bits per channel(RGBA) PNG file using D3DX11SaveTextureToFile?
Or any version of DirectX, any image library(C++), any image format
I tried to use the sample code here:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb205131(v=vs.85).aspx
and modified the function names to D3D11 version.
The program works perfectly when I set the desc.Format to DXGI_FORMAT_R8G8B8A8_UNORM .
But the D3DX11SaveTextureToFile returns E_FAIL when I changed the desc.Format to DXGI_FORMAT_R16G16B16A16_UNORM .
I've tried to use DevIL (developer's image library) but it doesn't support 16-bits per channel png file.
The only format which can save all texture-formats is D3DX11_IFF_DDS. It seems that D3DX11SaveTextureToFile can't save 16Bit pngs. One possibility is to extract the imagedata of your texture and save it manually with one of the possibilities (e.g. OpenCV or libpng) discussed here: Writing 16 bit uncompressed image using OpenCV.

Get PTS from raw H264 mdat generated by iOS AVAssetWriter

I'm trying to simultaneously read and write H.264 mov file written by AVAssetWriter. I managed to extract individual NAL units, pack them into ffmpeg's AVPackets and write them into another video format using ffmpeg. It works and the resulting file plays well except the playback speed is not right. How do I calculate the correct PTS/DTS values from raw H.264 data? Or maybe there exists some other way to get them?
Here's what I've tried:
Limit capture min/max frame rate to 30 and assume that the output file will be 30 fps. In fact its fps is always less than values that I set. And also, I think the fps is not constant from packet to packet.
Remember each written sample's presentation timestamp and assume that samples map one-to-one to NALUs and apply saved timestamp to output packet. This doesn't work.
Setting PTS to 0 or AV_NOPTS_VALUE. Doesn't work.
From googling about it I understand that raw H.264 data usually doesn't contain any timing info. It can sometimes have some timing info inside SEI, but the files that I use don't have it. On the other hand, there are some applications that do exactly what I'm trying to do, so I suppose it is possible somehow.
You will either have to generate them yourself, or access the Atom's containing timing information in the MP4/MOV container to generate PTS/DTS information. FFmpeg's mov.c in libavformat might help.
Each sample/frame you write with AVAssetWriter will map one to one with the VCL NALs. If all you are doing is converting then have FFmpeg do all the heavy lifting. It will properly maintain the timing information when going from one container format to another.
The bitstream generated by AVAssetWriter does not contain SEI data. It only contains SPS/PPS/I/P frames. The SPS also does not contain VUI or HRD parameters.
-- Edit --
Also, keep in mind that if you are saving PTS information from the CMSampleBufferRef's then the time base may be different from that of the target container. For instance AVFoundation time base is nanoseconds, and a FLV file is milliseconds.

Resources