Understanding Remote I/O AudioStreamBasicDescription (ASBD) - signal-processing

I need help understanding the following ASBD. It's the default ASBD assigned to a fresh instance of RemoteIO (I got it by executing AudioUnitGetProperty(..., kAudioUnitProperty_StreamFormat, ...) on the RemoteIO audio unit, right after allocating and initializing it).
Float64 mSampleRate 44100
UInt32 mFormatID 1819304813
UInt32 mFormatFlags 41
UInt32 mBytesPerPacket 4
UInt32 mFramesPerPacket 1
UInt32 mBytesPerFrame 4
UInt32 mChannelsPerFrame 2
UInt32 mBitsPerChannel 32
UInt32 mReserved 0
The question is, shouldn't mBytesPerFrame be 8? If I have 32 bits (4 bytes) per channel, and 2 channels per frame, shouldn't each frame be 8 bytes long (instead of 4)?
Thanks in advance.

The value of mBytesPerFrame depends on mFormatFlags. From CoreAudioTypes.h:
Typically, when an ASBD is being used, the fields describe the complete layout
of the sample data in the buffers that are represented by this description -
where typically those buffers are represented by an AudioBuffer that is
contained in an AudioBufferList.
However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
AudioBufferList has a different structure and semantic. In this case, the ASBD
fields will describe the format of ONE of the AudioBuffers that are contained in
the list, AND each AudioBuffer in the list is determined to have a single (mono)
channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
total number of AudioBuffers that are contained within the AudioBufferList -
where each buffer contains one channel. This is used primarily with the
AudioUnit (and AudioConverter) representation of this list - and won't be found
in the AudioHardware usage of this structure.

I believe that because the format flags specify kAudioFormatFlagIsNonInterleaved it follows that the size of a frame in any buffer can only be the size of a 1 channel frame. If this is correct mChannelsPerFrame is certainly a confusing name.
I hope someone else will confirm / clarify this.

Related

Did anyone notice Audio Converter Services changed internally on iOS 11?

My app is using Audio Converter Services to convert audio from 44.1 khz to 48 khz (16 bit linear mono), using AudioConverterFillComplexBuffer.
After upgrading iOS to 11.0 (or maybe 11.4) the audio contains "noises" that are cause by the callback returning samples with the value of zero at the "edges" of the buffer (not sure if first or last sample)
Does anyone know or noticed any change? It has been working fine for years, and still works fine on devices that run iOS 9.x
This is my setup:
// prepare the formats
// origin
AudioStreamBasicDescription originFormat = {0};
FillOutASBDForLPCM(originFormat, 44100.00, 1, sizeof(SInt16)*8, sizeof(SInt16)*8, false, false, false);
originFormat.mFormatFlags |= kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
originFormat.mReserved = 0;
// destination
AudioStreamBasicDescription destFormat = {0};
FillOutASBDForLPCM(destFormat, 48000.0, 1, sizeof(SInt16)*8, sizeof(SInt16)*8, false, false, false);
destFormat.mFormatFlags |= kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
destFormat.mReserved = 0;
// create a converter
AudioConverterRef audioConverter;
AudioConverterNew(&originFormat, &destFormat, &audioConverter);
I have found that converting between sample rates used to be more tolerant to missing data on the edges of the buffer.
For example, if you convert a buffer of 1024 frames, and need all of those to be converted to a new sample rate, but never provided samples before or after the buffer, apple used to round the numbers so that the noise is minimal.
However, starting iOS 11.4 (or so), the first frame of the converted buffer is very close to zero (probably because the converter is looking for samples before the first sample and can't find any)
The fix was to provide some extra samples to the buffer in question. For example, to convert the 1024 buffer, I sent the converter about 100 samples before and after that range (1224 in total), then read the result starting from sample number 100. Once I did this for every buffer, the result was clean

Last AudioQueueBuffer has exagerated magnitudes

The one who solves this has to have the Sherlock Holmes trophy. Here it goes.
I'm using AudioQueues to record sound (LPCM, SInt16, 4 buffers) In the callback, I tried measuring the mean amplitude by converting the samples to float and using vDSP_meamgv. Here are some example means:
Mean, No of samples
44.400364, 44100
36.077393, 44100
27.672422, 41984
2889.821289, 44100
57.481972, 44100
58.967506, 42872
54.691631, 44100
2894.467285, 44100
62.697800, 42872
63.732948, 44100
66.575623, 44100
2979.566406, 42872
As you can see, every fourth (last) buffer is wild. I looked at the separate samples, there are lots of 0's and lots of huge numbers, and no normal numbers, like for the other buffers. Things get more interesting. If I use 3 buffers instead, the third one (always the last) is a bogey. And this holds for any number of buffers I choose.
I put an if in the callback to not enqueue the wild buffers, and once it's gone, there are no more huge numbers, the other buffers continue to fill normally. I put in a button that reenqueues this queue after it is being dropped, and once I reenqueue it, it again gets filled with gigantic samples (namely that buffer!)
And now the cherry - I put my code to calculate the mean in other projects, like SpeakHere from Apple, and the same thing happens there o.O , although the app works fine, recording and playing back what was recorded.
I just don't get it, I've cracked my brain trying to figure this one out. If somebody would have a clue...
Here's the callback, if it helps:
void Recorder::MyInputBufferHandler(void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription* inPacketDesc) {
Recorder* eu = (Recorder*)inUserData;
vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, eu->conveier, 1, inBuffer->mAudioDataByteSize);
float mean;
vDSP_meamgv(eu->conveier, 1, &mean, inBuffer->mAudioDataByteSize);
printf("values: %f, %d\n",mean,inBuffer->mAudioDataByteSize);
// if (mean<2300)
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
}
'conveier' is a float array I've preallocated.
It's also me that gets the trophy. The error was that the vDSP functions shouldn't have got the mAudioDataByteSize parameter, because they need the number of ELEMENTS in the array. In my case each element (SInt16) has 2 bytes, so I should have passed mAudioDataByteSize / 2. When it read the last buffer, it fell off the edge by another length and counted some random data. Voila! Very basic mistake, but when you look in all the wrong places, it doesn't appear so.
For anybody that stepped on the same rake...
PS. It came to me while taking a bath :)

Non IDR Picture NAL Units - 0x21 and 0x61 meaning

Does anyone know what does 0x21 and 0x61 means in h.264 encoded video stream?
I know that 0x01 means it's a b-frame and 0x41 means it's a p-frame. My encoded video gives me two 0x21 frame followed by one b-frame.
I 21 21 B 21 21 B......
What is this 0x21?
First point, a NALu is not the same than as a frame. A frame can contain more that 1 NALu (but not less). A frame can also be made up of more than one slice type. A single frame can have I, B and P slices. If it is an IDR frame, then EVERY slice of that frame must be IDR.
0x01 is NOT a B slice. it is a "Coded slice of a non-IDR picture". exactly like 0x21 and 0x61. It could be a I/B/P or p slice. you need to parse the slice_type to know more.
From H.264 spec:
7.3.1 NAL unit syntax
forbidden_zero_bit - 1 bit - shall be equal to 0.
nal_ref_idc - 2 bits - not equal to 0 specifies that the content of the NAL unit contains a sequence parameter set [...]
nal_unit_type - 5 bits - specifies the type of RBSP data structure contained in the NAL unit [...]
0x21 and 0x61 make it NAL unit type 1 (Coded slice of a non-IDR picture) with different values for nal_ref_idc.
UPD. There is no one to one mapping of specific bit, esp. at fixed position from the beginning of the "frame" that says it's I/P/B frame. You will need to parse out the bitstream to read values per 7.4.3 Slice header semantics of H.264 spec (it is still doable in most cases since the value is real close to the beginning of the bitstream - check H.264 spec for details):

AudioQueue Bytes send to server

i am using AudioQueues to get Chunks of audio samples.
here is my callback method
void AQRecorder::MyInputBufferHandler( void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription* inPacketDesc)
there is api which expect me to send byte array (that i am not familiar with) which variable should i send in this case?
there is not a lot of docs about this one
The mDataByteSize element of the C struct pointed to by inPacketDesc will tell you the number of bytes per packet. And the inNumPackets function parameter is the number of packets sent to your Audio Queue callback function. Multiply the two to get the total number of bytes to send.
The app might also have set up the number of bytes per packet when configuring the Audio Queue, so you could just use that number.

Do I Need to Set the ASBD of a Core Audio File Player Audio Unit?

I've specified and instantiated two Audio Units: a multichannel mixer unit and a generator of subtype AudioFilePlayer.
I would have thought I needed to set the ASBD of the filePlayer's output to match the ASBD I set for the mixer input. However when I attempt to set the filePlayer's output I get a kAudioUnitErr_FormatNotSupported (-10868) error.
Here's the stream format I set on the mixer input (successfully) and am also trying to set on the filePlayer (it's the monostream format copied from Apple's mixerhost sample project):
Sample Rate: 44100
Format ID: lpcm
Format Flags: C
Bytes per Packet: 2
Frames per Packet: 1
Bytes per Frame: 2
Channels per Frame: 1
Bits per Channel: 16
In the course of troubleshooting this I queried the filePlayer AU for the format it is 'natively' set to. This is what's returned:
Sample Rate: 44100
Format ID: lpcm
Format Flags: 29
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
All the example code I've found sends the output of the filePlayer unit to an effect unit and set the filePlayer's output to match the ASBD set for the effect unit. Given I have no effect unit it seems like setting the filePlayer's output to the mixer input's ASBD would be the correct - and required - thing to do.
How have you configured the AUGraph? I might need to see some code to help you out.
Setting the output scope of AUMultiChannelMixer ASBD once only (as in MixerHost) works. However if you have any kind of effect at all, you will need to think about where their ASBDs are defined and how you arrange your code so CoreAudio does not jump in and mess with your effects AudioUnits ASBDs. By messing with I mean overriding your ASBD to the default kAudioFormatFlagIsFloat, kAudioFormatFlagIsPacked, 2 channels, non-interleaved. This was a big pain for me at first.
I would set the effects AudioUnits to their default ASBD. Assuming you have connected the AUFilePlayer node, then you can pull it out later in the program like this
result = AUGraphNodeInfo (processingGraph,
filePlayerNode,
NULL,
&filePlayerUnit);
And then proceed to set
AudioUnitSetProperty(filePlayerUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
0,
&monoStreamFormat,
sizeof(monoStreamFormat));
Hopefully this helps.
Basically I didn't bother setting the filePlayer ASBD but rather retrieved the 'native' ASBD it was set to and updated only the sample rate and channel count.
Likewise I didn't set input on the mixer and let the mixer figure it's format out.

Resources