Invalid video decoder config in MediaSource - media-source

I'm using MSE to play fragmented MP4 streams (H264 video) in browser(s).
The concept is working, there is a MediaSource and SourceBuffer, and I'm pushing data to SourceBuffer, and MediaSource is being displayed on the HTML page correctly.
However I've now found a stream which my configuration simply can't play.
I'd like to emphasize, that my MSE configuration is good and working for most of the streams - for all the streams I've tried until now. So I'd just skip the details of implementation for sake of simplicity.
There is an error message with a lot of details:
CHUNK_DEMUXER_ERROR_APPEND_FAILED: Invalid video decoder config: codec: h264, profile: h264 baseline, level: not available, alpha_mode: is_opaque, coded size: [0,0], visible rect: [0,0,0,0], natural size: [0,0], has extra data: false, encryption scheme: Unencrypted, rotation: 0°, flipped: 0, color space: {primaries:BT709, transfer:BT709, matrix:BT709, range:LIMITED}
It seems the video itself doesn't have the correct size information.
So the obvious question: (How) is it possible to configure the MediaSource's video decoder to update the stream's size (width and height) parameters?

This looks like a problem with the video bitstream of that particular piece of content. More specifically, in the decoder initialization config which usually is contained in special NAL units (SPS and PPS) of the initialization segment(s).
You probably won't be able to work around that. If you were to fix it you'd most likely have to re-write those NAL units in the bitstream which is not typically something to do on the client side. It's a content authoring issue.
Also, you might want to cross-validate with https://conformance.dashif.org/ or the dash.js reference player.

Related

Read HLS Playlist information to dynamically change the preferredBitRate of an Item

I'm working on a video app, we are changing form regular mp4 files to HLS, one of the many reasons we have to do the change is that we hace much more control over the bandwidth usage of videos (we load lots of other stuff in our player, so we need to optimize the experience the best way).
So, AVFoundation introduced in iOS10 the ability to control the bandwidth using:
AVPlayerItem *playerItem = [AVPlayerItem playerItemWithAsset:self.urlAsset];
playerItem.preferredForwardBufferDuration = 30.0;
playerItem.preferredPeakBitRate = 200000.0; // Remember this line
There's also a configuration introduced on iOS11 to set the maximum resolution of the item with preferredMaximumResolution, So we're using it, but we still need a solution for iOS10 devices.
Well, now we have control over the preferredPeakBitRate that's nice, but we have a problem, not all the HLS sources are generated by us, so, let's say we want to set a maximum resolution of 480p when you're not connected to a wifi network, today I don't have way to achieve that, not always I'm going to be able to know how much bandwidth needs the 480p source for the selected HLS playlist.
One thing I was thinking about is to read the information inside the m3u8 file, to at least know which are the different quality sources that my player can show and how much bandwidth needs everyone.
One way to do this, would download the m3u8 playlist as a plain text, use a regex to read the file and process this data, well, I'm trying to avoid that, I think that this should far less difficult.
I cannot read this information from the tracks, because a) I can't find the information, b) the tracks are replaced dynamically when changing the quality, yeah 1 track for every quality level.
So, I don't know how I can get this information, I've searched google, stackoverflow and I can't find this information, does any one can help me?
Here's an example for what I want to do, I have this example playlist:
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=314000,RESOLUTION=228x128,CODECS="mp4a.40.2"
test-hls-1-16a709300abeb08713a5cada91ab864e_hls_duplex_192k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=478000,RESOLUTION=400x224,CODECS="avc1.42001e,mp4a.40.2"
test-hls-1-16a709300abeb08713a5cada91ab864e_hls_duplex_400k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=691000,RESOLUTION=480x270,CODECS="avc1.42001e,mp4a.40.2"
test-hls-1-16a709300abeb08713a5cada91ab864e_hls_duplex_600k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1120000,RESOLUTION=640x360,CODECS="avc1.4d001f,mp4a.40.2"
test-hls-1-16a709300abeb08713a5cada91ab864e_hls_duplex_1000k.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1661000,RESOLUTION=960x540,CODECS="avc1.4d001f,mp4a.40.2"
test-hls-1-16a709300abeb08713a5cada91ab864e_hls_duplex_1500k.m3u8
And I just want to have that information available on an array inside my code, something like this:
NSArray<ZZMetadata *> *metadataArray = self.urlAsset.bandwidthMetadata;
NSLog(#"Metadata info: %#", metadataArray);
And print something like this:
<__NSArrayM 0x123456789> (
<ZZMetadata 0x234567890> {
trackId: 1
neededBandwidth: 314000
resolution: 228x128
codecs: ...
...
}
<ZZMetadata 0x345678901> {
trackId: 2
neededBandwidth: 478000
resolution: 400x224
}
...
}

Playing multi-sampled Instruments using AudioKit, controlling ADSR envelope

I'm trying to play instrument of several .wav samples using AudioKit.
I've tried so far:
Using AKSampler (with underlying AVAudioUnitSampler) – it worked fine, but I can't figure out how to control ADSR envelope here – calling stop will stop note immediately.
Another way is to use AKSamplePlayer for each sample and play it, manually setting rate so it play the right note. I can (possibly?) then connect AKAmplitudeEnvelope to each sample player. But if I want to play 5 notes of the same sample simultaneously, I would need 5 instances of AKSamplePlayer, which seems like wasting resources.
I also tried to find a way to just push raw audio samples to the AudioKit output buffer, making mixing and sample interpolation by myself (in C, probably?). But didn't find how to do it :(
What is the right way to make a multi-sampled instrument using AudioKit? I feel like it must be a fairly simple task.
Thanks to mahal tertin, it's pretty easy to use AKAUPresetBuilder!
You can create .aupreset file somewhere in tmp directory and then load this instrument with AKSampler.
The only thing worth noting is that by default AKAUPresetBuilder will generate samples with trigger mode set to trigger, which will ignore note-off events. So you should set it explicitly.
For example:
let sampleC4 = AKAUPresetBuilder.generateDictionary(
rootNote: 60,
filename: pathToC4WavSample,
startNote: 48,
endNote: 65)
sampleC4["triggerMode"] = "hold"
let sampleC5 = AKAUPresetBuilder.generateDictionary(
rootNote: 72,
filename: pathToC5WavSample,
startNote: 66,
endNote: 83)
sampleC5["triggerMode"] = "hold"
AKAUPresetBuilder.createAUPreset(
dict: [sampleC4, sampleC5],
path: pathToAUPresetFilename,
instrumentName: "My Instrument",
attack: 0,
release: 0.2)
and then create a sampler and start AudioKit:
sampler = AKSampler()
try sampler.loadInstrument(atPath: pathToAUPresetFilename)
AudioKit.output = sampler
AudioKit.start()
and then use this to start playing note:
sampler.play(noteNumber: MIDINoteNumber(63), velocity: MIDIVelocity(120), channel: 0)
and this to stop, respecting release parameter:
sampler.stop(noteNumber: MIDINoteNumber(63), channel: 0)
Probably the best way would be to embed your wav files into an EXS or Soundfont format, making use of tools in that realm to accomplish the ADSR for instance. Otherwise you'll kind of have to have an instrument for each sample.

How to save just raw PCM to file with iOS SDK (Core Audio)?

I'm converting an MP3 file into raw PCM, and I need to save it as just raw PCM. (Note, am using Java/RoboVM to port to iOS.)
I'm using the coreaudio package, and the relevant part of my code looks like this:
// Define the output PCM format.
AudioStreamBasicDescription outputFormat = new AudioStreamBasicDescription();
outputFormat.setFormat(AudioFormat.LinearPCM);
outputFormat.setFormatFlags(AudioFormatFlags.Canonical);
outputFormat.setBitsPerChannel(16);
outputFormat.setChannelsPerFrame(1);
outputFormat.setFramesPerPacket(1);
outputFormat.setBytesPerFrame(2);
outputFormat.setBytesPerPacket(2);
outputFormat.setSampleRate(22050);
// ...
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
I then run through a loop, reading from the MP3 file and writing to the output file.
Upon importing this raw file into Audacity, I notice it always has a spike at the start, indicating that it's not actually a raw PCM file but instead is inside of a wrapper with a header (whether it be WAV or CAF headers, etc).
I understand I can just take the file and strip the header off and get the raw PCM data, but in terms of space/performance of this part of my app, I'd love if I can just keep it simple and save the raw PCM data as-is without a wrapper, but I don't know how to go about doing that.
The issue arises here:
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
There aren't many choices for AudioFileType, I've tried WAVE and CAF. Ideally there would be a PCM or RAW option but there's not. Is there a specific AudioFileType I should choose, or do I need to go about this another way?
The extended audio file services framework doesn't support a "raw" PCM format.
For an application to understand a PCM format it needs to know data stuff like:
How many channels are there
Are they interleaved or not
What is the sample rate
Is the data floating point or not
What is the bit depth
etc...
In fact, on iOS and OS X the AudioStreamBasicDescription is a struct which tells you what is required to interpret a PCM stream. For this reason, a "raw PCM" format doesn't really work, it needs at least some metadata. The closest formats to raw PCM are WAV, AIFF and CAF. If these don't serve your purposes you'll have to create a custom file format. But this doesn't need to be difficult.
The extended audio file services APIs are quite configurable. After opening an audio file to read (ExtAudioFileOpenURL) you can set various properties on the ExtAudioFileRef handle.
In your case consider setting kExtAudioFileProperty_ClientDataFormat. This property controls the format of the PCM data read from the file. As ExtAudioFileRead decodes the input file, it will convert the data it sends back to the format you specify. There are some limitations to this method. IIRC, it does not support doing sample rate conversion and things like that.
As you read the properly decoded data, you can then use something like NSOutputStream to write the "raw PCM" format of your choice directly to a file with no metadata at all.

How to decode a H.264 frame on iOS by hardware decoding?

I have been used ffmpeg to decode every single frame that I received from my ip cam. The brief code looks like this:
-(void) decodeFrame:(unsigned char *)frameData frameSize:(int)frameSize{
AVFrame frame;
AVPicture picture;
AVPacket pkt;
AVCodecContext *context;
pkt.data = frameData;
pat.size = frameSize;
avcodec_get_frame_defaults(&frame);
avpicture_alloc(&picture, PIX_FMT_RGB24, targetWidth, targetHeight);
avcodec_decode_video2(&context, &frame, &got_picture, &pkt);
}
The code woks fine, but it's software decoding. I want to enhance the decoding performance by hardware decoding. After lots of research, I know it may be achieved by AVFoundation framework.
The AVAssetReader class may help, but I can't figure out what's the next.Could anyone points out the following steps for me? Any help would be appreciated.
iOS does not provide any public access directly to the hardware decode engine, because hardware is always used to decode H.264 video on iOS.
Therefore, session 513 gives you all the information you need to allow frame-by-frame decoding on iOS. In short, per that session:
Generate individual network abstraction layer units (NALUs) from your H.264 elementary stream. There is much information on how this is done online. VCL NALUs (IDR and non-IDR) contain your video data and are to be fed into the decoder.
Re-package those NALUs according to the "AVCC" format, removing NALU start codes and replacing them with a 4-byte NALU length header.
Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs via CMVideoFormatDescriptionCreateFromH264ParameterSets()
Package NALU frames as CMSampleBuffers per session 513.
Create a VTDecompressionSessionRef, and feed VTDecompressionSessionDecodeFrame() with the sample buffers
Alternatively, use AVSampleBufferDisplayLayer, whose -enqueueSampleBuffer: method obviates the need to create your own decoder.
Edit:
This link provide more detail explanation on how to decode h.264 step by step: stackoverflow.com/a/29525001/3156169
Original answer:
I watched the session 513 "Direct Access to Video Encoding and Decoding" in WWDC 2014 yesterday, and got the answer of my own question.
The speaker says:
We have Video Toolbox(in iOS 8). Video Toolbox has been there on
OS X for a while, but now it's finally populated with headers on
iOS.This provides direct access to encoders and decoders.
So, there is no way to do hardware decoding frame by frame in iOS 7, but it can be done in iOS 8.
Is there anyone figure out how to directly access to video encoding and decoding frame by frame in iOS 8?

How to utilize hardware decode for audio?

I have a buffer that contain packets read by ffmpeg from a video file encoded using H264/AAC
According to Apple document, audio stream encoded in AAC can be decode with hardware support,
how to decode the audio stream with hardware support ?
UPDATE: I use Audio Queue Service to output the audio. Right now i decode AAC packet using ffmpeg and send LPCM audio to AQS. According to the Apple document, I can send directly AAC audio to AQ and it will take care of decoding task. Does it decode with hardware? Do i need, and how to set Audio Queue's parameter to enable audio hardware decoding?
You can tell the system to not use hardware decoding but probably not the other way around.
constant to determine which hardware codecs can be used.
enum {
kAudioFormatProperty_HardwareCodecCapabilities = 'hwcc',
};
Constants
kAudioFormatProperty_HardwareCodecCapabilities
A UInt32 value indicating the number of codecs from the specified list that can be used, if the application were to begin using them in the specified order. Set the inSpecifier parameter to an array of AudioClassDescription structures that describes a set of one or more audio codecs. If the property value is the same as the size of the array in the inSpecifier parameter, all of the specified codecs can be used.
Available in iOS 3.0 and later.
Declared in AudioFormat.h.
Discussion
Use this property to determine whether a desired set of codecs can be simultaneously instantiated.
Hardware-based codecs can be used only when playing or recording using Audio Queue Services or using interfaces, such as AV Foundation, which use Audio Queue Services. In particular, you cannot use hardware-based audio codecs with OpenAL or when using the I/O audio unit.
When describing the presence of a hardware codec, the system does not consider the current audio session category. Some categories disallow the use of hardware codecs. A set of hardware codecs is considered available, by this constant, based only on whether the hardware supports the specified combination of codecs.
Some codecs may be available in both hardware and software implementations. Use the kAudioFormatProperty_Encoders and kAudioFormatProperty_Decoders constants to determine whether a given codec is present, and whether it is hardware or software-based.
Software-based codecs can always be instantiated, so there is no need to use this constant when using software encoding or decoding.
The following code example illustrates how to check whether or not a hardware AAC encoder and a hardware AAC decoder are available, in that order of priority:
AudioClassDescription requestedCodecs[2] = {
{
kAudioEncoderComponentType,
kAudioFormatAAC,
kAppleHardwareAudioCodecManufacturer
},
{
kAudioDecoderComponentType,
kAudioFormatAAC,
kAppleHardwareAudioCodecManufacturer
}
};
UInt32 successfulCodecs = 0;
size = sizeof (successfulCodecs);
OSStatus result = AudioFormatGetProperty (
kAudioFormatProperty_HardwareCodecCapabilities,
requestedCodecs,
sizeof (requestedCodecs),
&size,
&successfulCodecs
);
switch (successfulCodecs) {
case 0:
// aac hardware encoder is unavailable. aac hardware decoder availability
// is unknown; could ask again for only aac hardware decoding
case 1:
// aac hardware encoder is available but, while using it, no hardware
// decoder is available.
case 2:
// hardware encoder and decoder are available simultaneously
}
https://github.com/mooncatventures-group/sampleDecoder
You probably better off using audioUnits however rather than audio queue
You can, though as usual with Core Audio there are various caveats and edge cases to watch for.
Set the property kExtAudioFileProperty_CodecManufacturer to kAppleHardwareAudioCodecManufacturer. Do this before you set the client data format.
Some docs in ExtendedAudioFile.h
rather than doing this calculation just force a very large buffer size here.
status = AudioQueueAllocateBufferWithPacketDescriptions(audioQueue_,
_audioCodecContext->bit_rate * kAudioBufferSeconds / 8,
_audioCodecContext->sample_rate * kAudioBufferSeconds /
_audioCodecContext->frame_size + 1,
&audioQueueBuffer_[i]);
Found this:
https://developer.apple.com/library/ios/qa/qa1663/_index.html
Since the AudioFormatGetProperty doesn't work too often. The above describes how to use AudioFormatGetPropetyInfo for the Encoder or decoder and detect which is present in HW or SW.

Resources