The one who solves this has to have the Sherlock Holmes trophy. Here it goes.
I'm using AudioQueues to record sound (LPCM, SInt16, 4 buffers) In the callback, I tried measuring the mean amplitude by converting the samples to float and using vDSP_meamgv. Here are some example means:
Mean, No of samples
44.400364, 44100
36.077393, 44100
27.672422, 41984
2889.821289, 44100
57.481972, 44100
58.967506, 42872
54.691631, 44100
2894.467285, 44100
62.697800, 42872
63.732948, 44100
66.575623, 44100
2979.566406, 42872
As you can see, every fourth (last) buffer is wild. I looked at the separate samples, there are lots of 0's and lots of huge numbers, and no normal numbers, like for the other buffers. Things get more interesting. If I use 3 buffers instead, the third one (always the last) is a bogey. And this holds for any number of buffers I choose.
I put an if in the callback to not enqueue the wild buffers, and once it's gone, there are no more huge numbers, the other buffers continue to fill normally. I put in a button that reenqueues this queue after it is being dropped, and once I reenqueue it, it again gets filled with gigantic samples (namely that buffer!)
And now the cherry - I put my code to calculate the mean in other projects, like SpeakHere from Apple, and the same thing happens there o.O , although the app works fine, recording and playing back what was recorded.
I just don't get it, I've cracked my brain trying to figure this one out. If somebody would have a clue...
Here's the callback, if it helps:
void Recorder::MyInputBufferHandler(void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription* inPacketDesc) {
Recorder* eu = (Recorder*)inUserData;
vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, eu->conveier, 1, inBuffer->mAudioDataByteSize);
float mean;
vDSP_meamgv(eu->conveier, 1, &mean, inBuffer->mAudioDataByteSize);
printf("values: %f, %d\n",mean,inBuffer->mAudioDataByteSize);
// if (mean<2300)
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
}
'conveier' is a float array I've preallocated.
It's also me that gets the trophy. The error was that the vDSP functions shouldn't have got the mAudioDataByteSize parameter, because they need the number of ELEMENTS in the array. In my case each element (SInt16) has 2 bytes, so I should have passed mAudioDataByteSize / 2. When it read the last buffer, it fell off the edge by another length and counted some random data. Voila! Very basic mistake, but when you look in all the wrong places, it doesn't appear so.
For anybody that stepped on the same rake...
PS. It came to me while taking a bath :)
Related
I have a fairly complex app that has been working with the AKAppleSequencer up until now, but due to some strange behavior and bugs that pop up now and then with that sequencer, I've been hoping to move to the newer AKSequencer. Unfortunately, the new sequencer doesn't seem to be represented in the Playgrounds or much documentation, so I have been doing some guesswork. I have everything wired up in a way that seems to make sense (to me) and, as I mentioned, was working fine with AKAppleSequencer, but with AKSequencer it runs but no output is produced.
The structure of my code is broken out into multiple pieces so the node graph gets built up in disparate locations, so I'll have to show it here in chunks, with irrelevant lines deleted.
// This happens during setup
mainMixer = AKMixer()
mainMixer.volume = volume
AudioKit.output = mainMixer
// In later code, the sequencer is constructed
sequencer = AKSequencer()
sequencer!.tempo = tempo
// After the sequencer is created, I create various nodes and tracks, like this
let trackNode = trackDefinition.createNode()
let track = sequencer.addTrack(for: trackNode)
track >>> mainMixer
There's a line up there where I'm calling "createNode()" on a thing called trackDefinition. I don't think the details of that class are relevant here, but here's an example of the body of that method's code. It's pretty straightforward.
func createNode() -> AKNode {
let pad = AKMIDISampler()
do {
try pad.loadSoundFont(partConfiguration.settings["soundFontName"]!,
preset: Int(partConfiguration.settings["preset"]!)!,
bank: Int(partConfiguration.settings["bank"]!)!)
} catch {
print("Error while loading Sound Font in PadTrackDefinition: \(error)")
}
return pad
}
That code seems to be working fine. I just wanted to illustrate that I'm creating an AKMIDISampler node, loading a soundfont, and then using that node to create a track in the AKSequencer. Then I attach the track to the main mixer for output.
I used AudioKit.printConnections() to get some confirmation, and here's what that looks like.
(1]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (0]AudioDeviceOutput) bus: 0
(2]Local AKSequencerTrack <2 ch, 44100 Hz, Float32, non-inter> -> (1]AUMultiChannelMixer) bus: 0
Pretty simple... Track >>> Mixer >>> Output
Doesn't make any sound when playing.
I also tried it this way:
(0]AUSampler <2 ch, 44100 Hz, Float32, non-inter> -> (2]AUMultiChannelMixer) bus: 0
(2]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (1]AudioDeviceOutput) bus: 0
So that's AKMIDISampler >>> Mixer >>> Output (and the sampler was used to create a track).
That also doesn't make any sound.
I also saw this answer to a similar question on StackOverflow, so I tried that approach. That gave me this connection graph:
(0]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (1]AudioDeviceOutput) bus: 0
(2]Local AKSequencerTrack <2 ch, 44100 Hz, Float32, non-inter> -> (0]AUMultiChannelMixer) bus: 0
(3]AUSampler <2 ch, 44100 Hz, Float32, non-inter> -> (0]AUMultiChannelMixer) bus: 1
That would be [AKMIDISampler, Track] >>> Mixer >>> Output.
Still...no sound.
What am I doing wrong here? Is there some more specific way that the new sequencer tracks have to be connected into the signal graph that I'm not understanding?
UPDATE: Weird/fun/interesting addendum, if I add this code immediately after the node construction code, it produces the expected note, so I know that at least the audio engine itself is hooked up:
let midiNode = trackNode as! AKMIDISampler
try! midiNode.play(noteNumber: 60,
velocity: MIDIVelocity(127),
channel: MIDIChannel(8))
I figured this out, and wanted to post the answer here for future developers who may run into confusion around this, and also for the core AudioKit team to see, so they can understand what might not be obvious from the API.
The root of the problem here was that the AKSequencer is not a drop-in replacement for the AKAppleSequencer, even though the APIs for the two are extremely similar.
One thing to point out: I have confirmed that it is in fact necessary to add both the track itself and the track's target node to the signal chain in order to get sound output. So from my examples above, you need this one:
[AKMIDISampler, Track] >>> Mixer >>> Output
This is sort of weird and confusing, because it's not at all obvious where I would be expected to put effects nodes in between those. I haven't played with that yet, but it seems very strange to have these nodes both be siblings in the signal chain. I would think it would look like this:
Track >>> AKMIDISampler >>> Mixer >>> Output
That makes more sense to me. Oh well.
Anyway, I mentioned that there were some other factors that were the root of the problem. The key difference was that with the AKAppleSequencer, the track lengths could start out at 0 and then grow as you added additional notes to them. This is the approach I was using, as I was starting with empty tracks and then populating them procedurally.
With the new AKSequencer, it doesn't appear to work that way. The length starts out as 4.0, not 0, and it does not grow automatically as you add notes to the tracks. I had to manually calculate the length required to fit my notes, and then set that length using track.length = desiredLength. The good news is, the AKSequencer is able to understand to use the length of the track, so you can set it on just the tracks, and not the sequencer itself if you prefer.
Another notable difference is the behavior of stop() on the sequencer. On the AKAppleSequencer, invoking stop() also stops the playback of all the notes. On the new AKSequencer, the same method will leave notes playing. You need to do a loop over the tracks like this:
sequencer.stop()
for track in sequencer.tracks {
track.stopPlayingNotes()
}
I know the AKSequencer is brand new, so some things like this are to be expected. I still have hope that it is going to be better in the long run than the AKAppleSequencer.
I hope this explanation will help out somebody like me who got stuck switching to the new sequencer!
My app is using Audio Converter Services to convert audio from 44.1 khz to 48 khz (16 bit linear mono), using AudioConverterFillComplexBuffer.
After upgrading iOS to 11.0 (or maybe 11.4) the audio contains "noises" that are cause by the callback returning samples with the value of zero at the "edges" of the buffer (not sure if first or last sample)
Does anyone know or noticed any change? It has been working fine for years, and still works fine on devices that run iOS 9.x
This is my setup:
// prepare the formats
// origin
AudioStreamBasicDescription originFormat = {0};
FillOutASBDForLPCM(originFormat, 44100.00, 1, sizeof(SInt16)*8, sizeof(SInt16)*8, false, false, false);
originFormat.mFormatFlags |= kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
originFormat.mReserved = 0;
// destination
AudioStreamBasicDescription destFormat = {0};
FillOutASBDForLPCM(destFormat, 48000.0, 1, sizeof(SInt16)*8, sizeof(SInt16)*8, false, false, false);
destFormat.mFormatFlags |= kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
destFormat.mReserved = 0;
// create a converter
AudioConverterRef audioConverter;
AudioConverterNew(&originFormat, &destFormat, &audioConverter);
I have found that converting between sample rates used to be more tolerant to missing data on the edges of the buffer.
For example, if you convert a buffer of 1024 frames, and need all of those to be converted to a new sample rate, but never provided samples before or after the buffer, apple used to round the numbers so that the noise is minimal.
However, starting iOS 11.4 (or so), the first frame of the converted buffer is very close to zero (probably because the converter is looking for samples before the first sample and can't find any)
The fix was to provide some extra samples to the buffer in question. For example, to convert the 1024 buffer, I sent the converter about 100 samples before and after that range (1224 in total), then read the result starting from sample number 100. Once I did this for every buffer, the result was clean
I'm writing an application where I should play parts of audio files. Each audio file contains audio data for a separate track.
These parts are sections with a begin time and a end time, and I'm trying to play those parts in the order I choose.
So for example, imagine I have 4 sections :
A - B - C - D
and I activate B and D, I want to play, B, then D, then B again, then D, etc..
To make smooth 'jumps" in playback I think it's important to fade in/out start/end sections buffers.
So, I have a basic AVAudioEngine setup, with AVAudioPlayerNode, and a mixer.
For each audio section, I cache some information :
a buffer for the first samples in the section (which I fade in manually)
a tuple for the AVAudioFramePosition, and AVAudioFrameCount of a middle segment
a buffer for the end samples in the audio section (which I fade out manually)
now, when I schedule a section for playing, I say the AVAudioPlayerNode :
schedule the start buffer (scheduleBuffer(_:completionHandler:) no option)
schedule the middle segment (scheduleSegment(_:startingFrame:frameCount:at:completionHandler:))
finally schedule the end buffer (scheduleBuffer(_:completionHandler:) no option)
all at "time" nil.
The problem here is I can hear clic, and crappy sounds at audio sections boundaries and I can't see where I'm doing wrong.
My first idea was the fades I do manually (basically multiplying sample values by a volume factor), but same result without doing that.
I thought I didn't schedule in time, but scheduling sections in advance, A - B - C for example beforehand has the same result.
I then tried different frame position computations, with audio format settings, same result.
So I'm out of ideas here, and perhaps I didn't get the schedule mechanism right.
Can anyone confirm I can mix scheduling buffers and segments in AVAudioPlayerNode ? or should I schedule only buffers or segments ?
I can confirm that scheduling only segments works, playback is perfectly fine.
A little context on how I cache information for audio sections..
In the code below, file is of type AVAudioFile loaded on disk from a URL, begin and end are TimeInterval values, and represent the start/end of my audio section.
let format = file.processingFormat
let startBufferFrameCount: AVAudioFrameCount = 4096
let endBufferFrameCount: AVAudioFrameCount = 4096
let audioSectionStartFrame = framePosition(at: begin, format: format)
let audioSectionEndFrame = framePosition(at: end, format: format)
let segmentStartFrame = audioSectionStartFrame + AVAudioFramePosition(startBufferFrameCount)
let segmentEndFrame = audioSectionEndFrame - AVAudioFramePosition(endBufferFrameCount)
startBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: startBufferFrameCount)
endBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: endBufferFrameCount)
file.framePosition = audioSectionStartFrame
try file.read(into: startBuffer)
file.framePosition = segmentEndFrame
try file.read(into: endBuffer)
middleSegment = (segmentStartFrame, AVAudioFrameCount(segmentEndFrame - segmentStartFrame))
frameCount = AVAudioFrameCount(audioSectionEndFrame - audioSectionStartFrame)
Also, the framePosition(at:format:) multiplies the TimeInterval value by the sample rate of the AVAudioFormat passed in.
I cache this information for every audio section, but I hear clicks at section boundaries, no matter if I schedule them in advance or not.
I also tried not mixing buffer and segments when scheduling, but I doesn't change anything, so I start thinking I'm doing wrong frame computations.
i am using AudioQueues to get Chunks of audio samples.
here is my callback method
void AQRecorder::MyInputBufferHandler( void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription* inPacketDesc)
there is api which expect me to send byte array (that i am not familiar with) which variable should i send in this case?
there is not a lot of docs about this one
The mDataByteSize element of the C struct pointed to by inPacketDesc will tell you the number of bytes per packet. And the inNumPackets function parameter is the number of packets sent to your Audio Queue callback function. Multiply the two to get the total number of bytes to send.
The app might also have set up the number of bytes per packet when configuring the Audio Queue, so you could just use that number.
I need help understanding the following ASBD. It's the default ASBD assigned to a fresh instance of RemoteIO (I got it by executing AudioUnitGetProperty(..., kAudioUnitProperty_StreamFormat, ...) on the RemoteIO audio unit, right after allocating and initializing it).
Float64 mSampleRate 44100
UInt32 mFormatID 1819304813
UInt32 mFormatFlags 41
UInt32 mBytesPerPacket 4
UInt32 mFramesPerPacket 1
UInt32 mBytesPerFrame 4
UInt32 mChannelsPerFrame 2
UInt32 mBitsPerChannel 32
UInt32 mReserved 0
The question is, shouldn't mBytesPerFrame be 8? If I have 32 bits (4 bytes) per channel, and 2 channels per frame, shouldn't each frame be 8 bytes long (instead of 4)?
Thanks in advance.
The value of mBytesPerFrame depends on mFormatFlags. From CoreAudioTypes.h:
Typically, when an ASBD is being used, the fields describe the complete layout
of the sample data in the buffers that are represented by this description -
where typically those buffers are represented by an AudioBuffer that is
contained in an AudioBufferList.
However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
AudioBufferList has a different structure and semantic. In this case, the ASBD
fields will describe the format of ONE of the AudioBuffers that are contained in
the list, AND each AudioBuffer in the list is determined to have a single (mono)
channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
total number of AudioBuffers that are contained within the AudioBufferList -
where each buffer contains one channel. This is used primarily with the
AudioUnit (and AudioConverter) representation of this list - and won't be found
in the AudioHardware usage of this structure.
I believe that because the format flags specify kAudioFormatFlagIsNonInterleaved it follows that the size of a frame in any buffer can only be the size of a 1 channel frame. If this is correct mChannelsPerFrame is certainly a confusing name.
I hope someone else will confirm / clarify this.