VTCompressionSession Bitrate/Datarate overshooting - ios

I have been working on an H264 hardware accelerated encoder implementation using VideoToolbox's VTCompressionSession for a while now, and a consistent problem has been the unreliable bitrate coming out of it. I have read many forum posts and looked through existing code for this, and tried to follow suit, but the bitrate out of my encoder is almost always somewhere between 5% and 50% off what it is set at, and on occasion I've seen some huge errors, like even 400% overshoot, where even one frame will be twice the size of the given average bitrate.
My session is setup as follows:
kVTCompressionPropertyKey_AverageBitRate = desired bitrate
kVTCompressionPropertyKey_DataRateLimits = [desired bitrate / 8, 1]; accounting for bits vs bytes
kVTCompressionPropertyKey_ExpectedFrameRate = framerate (30, 15, 5, or 1 fps)
kVTCompressionPropertyKey_MaxKeyFrameInterval = 1500
kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration = 1500 / framerate
kVTCompressionPropertyKey_AllowFrameReordering = NO
kVTCompressionPropertyKey_ProfileLevel = kVTProfileLevel_H264_Main_AutoLevel
kVTCompressionPropertyKey_RealTime = YES
kVTCompressionPropertyKey_H264EntropyMode = kVTH264EntropyMode_CABAC
kVTCompressionPropertyKey_BaseLayerFrameRate = framerate / 2
And I adjust the average bitrate and datarate values throughout the session to try and compensate for the volatility (if it's too high, I reduce them a bit, if too low, I increase them, with restrictions on how high and low to go).
I create the session and then apply the above configuration as a single dictionary using VTSessionSetProperties and feed frames into it like this:
VTCompressionSessionEncodeFrame(compressionSessionRef,
static_cast<CVImageBufferRef<(pixelBuffer),
CMTimeMake(capturetime, 1000),
kCMTimeInvalid,
frameProperties,
frameDetailsStruct,
&encodeInfoFlags);
So I'm supplying timing information as the API says to do.
Then I add up the size of the output for each frame and divide over a periodic time period, to determine the outgoing bitrate and error from desired. This is where I see the significant volatility.
I'm looking for any help in getting the bitrate under control, as I'm not sure what to do at this point. Thank you!

I think you can check the frameTimestamp set in VTCompressionSessionEncodeFrame, it seems affects the bitrate. If you change frame rate, change the frameTimestamp.

Related

Multithreading in HM reference software

Encoding an UHD sequences with HEVC HM reference software takes days on CPU’s even with monster computers, I want to know if it’s possible and then how to increase the number of threads (even if it decreases the quality of the encoding) to speed up the process (I want it to rise up to x4 times at least).
is this possible by increasing number of tiles , because by default there is only one tile per pic, or should we change in the source code? and where exactly?!
seems that the answer to increase encoding speed was not the number of tiles but the WPP.
the HM gives the possibility to increase the number of tile in condition that the min tile with is 4 CTU (4*64 pel) and min height is 1 CTU (64pel). so, u can’t just choose any number .
when you activate the WPP , you can have up to 17 line in same time , but you cannot use WPP and tiles in same time.
testing this with basketballdrive HD seq QP=37 :
T(sec) Rate(kbps) PSNR
1 tile : 171013.381 1761.7472 34.5743
4 tiles : 166401.603 1822.1880 34.5439 = saves about 3 hours
WPP : 166187.201 1785.4048 34.5483 = ~same
could saves more with UHD seq but it's not enough for me. 3h is nothing for JEM and WPP are removed from the new VTM (FVC).

How to record at a low sample (around 1000 Hz) on an iPhone

I am writing an app to recording single-channel audio with the built-in microphone on an iPhone 6. The apps works as expected when configured to record at 8000 Hz. Here's the code
// Set up audio session
let session = AVAudioSession.sharedInstance()
// Configure audio session
do {
try session.setCategory(AVAudioSessionCategoryPlayAndRecord)
var recordSettings = [String:AnyObject]()
recordSettings[AVFormatIDKey] = Int(kAudioFormatLinearPCM) as AnyObject
// Set the sampling rate
recordSettings[AVSampleRateKey] = 8000.0 as AnyObject
recordSettings[AVNumberOfChannelsKey] = 1 as AnyObject
recorder = try AVAudioRecorder(url: outputFileURL, settings: recordSettings)
recorder?.delegate = self
recorder?.isMeteringEnabled = true
recorder?.prepareToRecord()
return true
}
catch {
throw Error.AVConfiguration
}
To reduce the storage requirements, I would like to record at a much lower sample rate (ideally less than 1000 Hz). If I set the sample rate to 1000 Hz, the app records at 8000 Hz.
According to Apple's documentation,
The available range for hardware sample rate is device dependent. It typically ranges from 8000 through 48000 hertz.
Question...is it possible to use AVAudioSession (or other framework) to record audio at a low sample rate?
Audio recording on iPhone are made with hardware codecs so available frame rates are hardcoded and can't be changed. But if you need to have 1kHz sample rate you can record at 8kHz and than just resample you record with some resample library. Personally, I prefer to use ffmpeg for such tasks.
I hope you are aware that by the niquist theorem you cannot expect very useful results of what you try to achieve.
That is, except you are targeting for low frequencies only. For that case you might want to use a low-pass filter first. It's almost impossible to understand voices with olny frequencies below 500 Hz. Speech is usually said to require 3 kHz, that makes for a sample rate of 6000.
For an example of what you'd have to expect try something similar to:
ffmpeg -i tst.mp3 -ar 1000 tst.wav
with e.g. some vocals and listen to the result.
You can however possibly achieve some acceptable trade-off using e.g. a sample rate of 3000.
An alternative would be to do some compression on the fly as #manishg suggested. As Smartphones these days can do video compression in real time that should be totally feasible with iPhone's Hard- and Software. But it's a totally different thing than reducing the sample rate.

Bitrate is not getting limited for H.264 HW accelerated encode on iOS using the VideoToolbox API

Bitrate is not getting limited for H.264 HW accelerated encode on iOS using the VideoToolbox API with property kVTCompressionPropertyKey_AverageBitRate.
It is observed that the bitrate is shooting upto 4mbps(for both 1280x780, 640x360) at times for H.264 HW accelerated encode though the encoder's bitrate is configured rightly.
This high bitrate value is not in the acceptable limits.
*There is a single property for setting bitrate i.e kVTCompressionPropertyKey_AverageBitRate available in the videoToolbox. The documentation says "This is not a hard limit; the bit rate may peak above this".
I have tried below two things :
1. Set bitrate and Set Data rate to some hardcoded values, as a part of encoderSpec attribute of VTCompressionSessionCreate in the init. Removed any re configuring/setting of bitrate after the init.
2. Set bitrate and Set Data rate using VTSessionSetProperty run time
Both does not seem to work.
Is there any way to restrict the bitrate to certain limit ? Any help is greatly appreciated.
If you deal with motion scene, 4 Mbps perhaps is a right value. In non-real time situation, I think you should try to configure Profile to High with Level 5, setting H264EntropyMode to CABAC and extend the value of MaxKeyFrameInterval key.

Get peak volume of audio input on iOS

On iOS 7, how do I get the current microphone input volume in a range between 0 and 1?
I've seen several approaches like this one, but the results I get baffle me.
The return values of peakPowerForChannel: are documented to be in the range of -160 to 0 with 0 being the loudest and -160 near absolute silence.
Problem: Given a quite room and a short but loud noise, the power goes all the way up in an instant but takes very long time to drop back to quite level (way longer than the actual noise...)
What I want: Essentially I want an exact copy of the Audio Input patch of Quartz Composer with its Volume Peak output. Any tips?
To get a similar volume peak measurement, you might have to input raw audio via the iOS Audio Queue API (or the RemoteIO Audio Unit), and analyze the raw PCM waveform samples in each audio callback, looking for a magnitude maxima over your desired frame width or analysis time.

VTCompressionSessionEncodeFrame: last seconds are lost?

I am using VTCompressionSessionEncodeFrameWithOutputHandler to compress pixel buffers from camera into raw h264 stream. I am using kVTEncodeFrameOptionKey_ForceKeyFrame to be sure that every output from VTCompressionSessionEncodeFrame is not dependent on other pieces. Also, there is kVTCompressionPropertyKey_AllowFrameReordering = false, kVTCompressionPropertyKey_RealTime = true options during session initialization and VTCompressionSessionCompleteFrames called after each VTCompressionSessionEncodeFrame call.
I also collect samples, produced by VTCompressionSessionEncodeFrame and periodically save them as MP4 file (using Bento4 library).
But final track is always shorter than samples, feeded to VTCompressionSessionEncodeFrame on 1-2 seconds. After several attempts to resolve this, i can be sure, that is it VTCompressionSessionEncodeFrame outputs frames, that depends on later frames to be decoded properly - so this frames are lost, since they can not be used to produce "final chunks" of the track.
So the question - how one can force VTCompressionSessionEncodeFrame to produce totally independent data chunks?
Turn out this was... FPS issue! NAL units do not have special timing itself (aside of pts, which is capture-fps-bound in my case), so it is quite important they are produced at exact rate as FPS in movie is expecting them to be... Nothing was lost, just saved frames were played faster (this was not so easy to spot, in fact)

Resources