I have a fairly complex app that has been working with the AKAppleSequencer up until now, but due to some strange behavior and bugs that pop up now and then with that sequencer, I've been hoping to move to the newer AKSequencer. Unfortunately, the new sequencer doesn't seem to be represented in the Playgrounds or much documentation, so I have been doing some guesswork. I have everything wired up in a way that seems to make sense (to me) and, as I mentioned, was working fine with AKAppleSequencer, but with AKSequencer it runs but no output is produced.
The structure of my code is broken out into multiple pieces so the node graph gets built up in disparate locations, so I'll have to show it here in chunks, with irrelevant lines deleted.
// This happens during setup
mainMixer = AKMixer()
mainMixer.volume = volume
AudioKit.output = mainMixer
// In later code, the sequencer is constructed
sequencer = AKSequencer()
sequencer!.tempo = tempo
// After the sequencer is created, I create various nodes and tracks, like this
let trackNode = trackDefinition.createNode()
let track = sequencer.addTrack(for: trackNode)
track >>> mainMixer
There's a line up there where I'm calling "createNode()" on a thing called trackDefinition. I don't think the details of that class are relevant here, but here's an example of the body of that method's code. It's pretty straightforward.
func createNode() -> AKNode {
let pad = AKMIDISampler()
do {
try pad.loadSoundFont(partConfiguration.settings["soundFontName"]!,
preset: Int(partConfiguration.settings["preset"]!)!,
bank: Int(partConfiguration.settings["bank"]!)!)
} catch {
print("Error while loading Sound Font in PadTrackDefinition: \(error)")
}
return pad
}
That code seems to be working fine. I just wanted to illustrate that I'm creating an AKMIDISampler node, loading a soundfont, and then using that node to create a track in the AKSequencer. Then I attach the track to the main mixer for output.
I used AudioKit.printConnections() to get some confirmation, and here's what that looks like.
(1]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (0]AudioDeviceOutput) bus: 0
(2]Local AKSequencerTrack <2 ch, 44100 Hz, Float32, non-inter> -> (1]AUMultiChannelMixer) bus: 0
Pretty simple... Track >>> Mixer >>> Output
Doesn't make any sound when playing.
I also tried it this way:
(0]AUSampler <2 ch, 44100 Hz, Float32, non-inter> -> (2]AUMultiChannelMixer) bus: 0
(2]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (1]AudioDeviceOutput) bus: 0
So that's AKMIDISampler >>> Mixer >>> Output (and the sampler was used to create a track).
That also doesn't make any sound.
I also saw this answer to a similar question on StackOverflow, so I tried that approach. That gave me this connection graph:
(0]AUMultiChannelMixer <2 ch, 44100 Hz, Float32, non-inter> -> (1]AudioDeviceOutput) bus: 0
(2]Local AKSequencerTrack <2 ch, 44100 Hz, Float32, non-inter> -> (0]AUMultiChannelMixer) bus: 0
(3]AUSampler <2 ch, 44100 Hz, Float32, non-inter> -> (0]AUMultiChannelMixer) bus: 1
That would be [AKMIDISampler, Track] >>> Mixer >>> Output.
Still...no sound.
What am I doing wrong here? Is there some more specific way that the new sequencer tracks have to be connected into the signal graph that I'm not understanding?
UPDATE: Weird/fun/interesting addendum, if I add this code immediately after the node construction code, it produces the expected note, so I know that at least the audio engine itself is hooked up:
let midiNode = trackNode as! AKMIDISampler
try! midiNode.play(noteNumber: 60,
velocity: MIDIVelocity(127),
channel: MIDIChannel(8))
I figured this out, and wanted to post the answer here for future developers who may run into confusion around this, and also for the core AudioKit team to see, so they can understand what might not be obvious from the API.
The root of the problem here was that the AKSequencer is not a drop-in replacement for the AKAppleSequencer, even though the APIs for the two are extremely similar.
One thing to point out: I have confirmed that it is in fact necessary to add both the track itself and the track's target node to the signal chain in order to get sound output. So from my examples above, you need this one:
[AKMIDISampler, Track] >>> Mixer >>> Output
This is sort of weird and confusing, because it's not at all obvious where I would be expected to put effects nodes in between those. I haven't played with that yet, but it seems very strange to have these nodes both be siblings in the signal chain. I would think it would look like this:
Track >>> AKMIDISampler >>> Mixer >>> Output
That makes more sense to me. Oh well.
Anyway, I mentioned that there were some other factors that were the root of the problem. The key difference was that with the AKAppleSequencer, the track lengths could start out at 0 and then grow as you added additional notes to them. This is the approach I was using, as I was starting with empty tracks and then populating them procedurally.
With the new AKSequencer, it doesn't appear to work that way. The length starts out as 4.0, not 0, and it does not grow automatically as you add notes to the tracks. I had to manually calculate the length required to fit my notes, and then set that length using track.length = desiredLength. The good news is, the AKSequencer is able to understand to use the length of the track, so you can set it on just the tracks, and not the sequencer itself if you prefer.
Another notable difference is the behavior of stop() on the sequencer. On the AKAppleSequencer, invoking stop() also stops the playback of all the notes. On the new AKSequencer, the same method will leave notes playing. You need to do a loop over the tracks like this:
sequencer.stop()
for track in sequencer.tracks {
track.stopPlayingNotes()
}
I know the AKSequencer is brand new, so some things like this are to be expected. I still have hope that it is going to be better in the long run than the AKAppleSequencer.
I hope this explanation will help out somebody like me who got stuck switching to the new sequencer!
Related
I'm trying to change device of the inputNode of AVAudioEngine.
To do so, I'm calling setDeviceID on its auAudioUnit. Although this call doesn't fail, something wrong happens to the output busses.
When I ask for its format, it shows a 0Hz and 0 channels format. It makes the app crash when I try to connect the node to the mainMixerNode.
Can anyone explain what's wrong with this code?
avEngine = AVAudioEngine()
print(avEngine.inputNode.auAudioUnit.inputBusses[0].format)
// <AVAudioFormat 0x1404b06e0: 2 ch, 44100 Hz, Float32, non-inter>
print(avEngine.inputNode.auAudioUnit.outputBusses[0].format)
// <AVAudioFormat 0x1404b0a60: 2 ch, 44100 Hz, Float32, inter>
// Now, let's change a device from headphone's mic to built-in mic.
try! avEngine.inputNode.auAudioUnit.setDeviceID(inputDevice.deviceID)
print(avEngine.inputNode.auAudioUnit.inputBusses[0].format)
// <AVAudioFormat 0x1404add50: 2 ch, 44100 Hz, Float32, non-inter>
print(avEngine.inputNode.auAudioUnit.outputBusses[0].format)
// <AVAudioFormat 0x1404adff0: 0 ch, 0 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved>
// !!!
// Interestingly, 'inputNode' shows a different format than `auAudioUnit`
print(avEngine.inputNode.inputFormat(forBus: 0))
// <AVAudioFormat 0x1404af480: 1 ch, 44100 Hz, Float32>
print(avEngine.inputNode.outputFormat(forBus: 0))
// <AVAudioFormat 0x1404ade30: 1 ch, 44100 Hz, Float32>
Edit:
Further debugging revels another puzzling thing.
avEngine.inputNode.auAudioUnit == avEngine.outputNode.auAudioUnit // this is true ?!
inputNode and outputNode share the same AUAudioUnit. And its deviceID is by default set to the speakers. It's so confusing to me...why would inpudeNode's device be a speaker?
I had a similar problem on both Simulator and device. My inputNode was reporting 2 channels, but 0 hz sample rate. Turns out I was starting the AudioEngine before attaching the nodes. Moved the start to the last step and it now works.
I have written quite a few AVAudioEngine Apps and can't believe I made this rookie mistake in my current App. However would be great if Apple could return an error from the connect() method stating the issue instead of just raising a generic fatal error.
Also I am using the AudioSession.setPreferredInput instead of the above setDevice. Using enableBuiltInMic method from here:
https://developer.apple.com/documentation/avfoundation/avaudiosession/capturing_stereo_audio_from_built-in_microphones
I'm attempting to sync recorded audio (from an AVAudioEngine inputNode) to an audio file that was playing during the recording process. The result should be like multitrack recording where each subsequent new track is synced with the previous tracks that were playing at the time of recording.
Because sampleTime differs between the AVAudioEngine's output and input nodes, I use hostTime to determine the offset of the original audio and the input buffers.
On iOS, I would assume that I'd have to use AVAudioSession's various latency properties (inputLatency, outputLatency, ioBufferDuration) to reconcile the tracks as well as the host time offset, but I haven't figured out the magic combination to make them work. The same goes for the various AVAudioEngine and Node properties like latency and presentationLatency.
On macOS, AVAudioSession doesn't exist (outside of Catalyst), meaning I don't have access to those numbers. Meanwhile, the latency/presentationLatency properties on the AVAudioNodes report 0.0 in most circumstances. On macOS, I do have access to AudioObjectGetPropertyData and can ask the system about kAudioDevicePropertyLatency, kAudioDevicePropertyBufferSize,kAudioDevicePropertySafetyOffset, etc, but am again at a bit of a loss as to what the formula is to reconcile all of these.
I have a sample project at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest that runs a simple loopback test (on macOS, iOS, or Mac Catalyst) and shows the result. On my Mac, the offset between tracks is ~720 samples. On others' Macs, I've seen as much as 1500 samples offset.
On my iPhone, I can get it close to sample-perfect by using AVAudioSession's outputLatency + inputLatency. However, the same formula leaves things misaligned on my iPad.
What's the magic formula for syncing the input and output timestamps on each platform? I know it may be different on each, which is fine, and I know I won't get 100% accuracy, but I would like to get as close as possible before going through my own calibration process
Here's a sample of my current code (full sync logic can be found at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest/blob/main/AudioEngineLoopbackLatencyTest/AudioManager.swift):
//Schedule playback of original audio during initial playback
let delay = 0.33 * state.secondsToTicks
let audioTime = AVAudioTime(hostTime: mach_absolute_time() + UInt64(delay))
state.audioBuffersScheduledAtHost = audioTime.hostTime
...
//in the inputNode's inputTap, store the first timestamp
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (pcmBuffer, timestamp) in
if self.state.inputNodeTapBeganAtHost == 0 {
self.state.inputNodeTapBeganAtHost = timestamp.hostTime
}
}
...
//after playback, attempt to reconcile/sync the timestamps recorded above
let timestampToSyncTo = state.audioBuffersScheduledAtHost
let inputNodeHostTimeDiff = Int64(state.inputNodeTapBeganAtHost) - Int64(timestampToSyncTo)
let inputNodeDiffInSamples = Double(inputNodeHostTimeDiff) / state.secondsToTicks * inputFileBuffer.format.sampleRate //secondsToTicks is calculated using mach_timebase_info
//play the original metronome audio at sample position 0 and try to sync everything else up to it
let originalAudioTime = AVAudioTime(sampleTime: 0, atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
originalAudioPlayerNode.scheduleBuffer(metronomeFileBuffer, at: originalAudioTime, options: []) {
print("Played original audio")
}
//play the tap of the input node at its determined sync time -- this _does not_ appear to line up in the result file
let inputAudioTime = AVAudioTime(sampleTime: AVAudioFramePosition(inputNodeDiffInSamples), atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
recordedInputNodePlayer.scheduleBuffer(inputFileBuffer, at: inputAudioTime, options: []) {
print("Input buffer played")
}
When running the sample app, here's the result I get:
This answer is applicable to native macOS only
General Latency Determination
Output
In the general case the output latency for a stream on a device is determined by the sum of the following properties:
kAudioDevicePropertySafetyOffset
kAudioStreamPropertyLatency
kAudioDevicePropertyLatency
kAudioDevicePropertyBufferFrameSize
The device safety offset, stream, and device latency values should be retrieved for kAudioObjectPropertyScopeOutput.
On my Mac for the audio device MacBook Pro Speakers at 44.1 kHz this equates to 71 + 424 + 11 + 512 = 1018 frames.
Input
Similarly, the input latency is determined by the sum of the following properties:
kAudioDevicePropertySafetyOffset
kAudioStreamPropertyLatency
kAudioDevicePropertyLatency
kAudioDevicePropertyBufferFrameSize
The device safety offset, stream, and device latency values should be retrieved for kAudioObjectPropertyScopeInput.
On my Mac for the audio device MacBook Pro Microphone at 44.1 kHz this equates to 114 + 2404 + 40 + 512 = 3070 frames.
AVAudioEngine
How the information above relates to AVAudioEngine is not immediately clear. Internally AVAudioEngine creates a private aggregate device and Core Audio essentially handles latency compensation for aggregate devices automatically.
During experimentation for this answer I've found that some (most?) audio devices don't report latency correctly. At least that is how it seems, which makes accurate latency determination nigh impossible.
I was able to get fairly accurate synchronization using my Mac's built-in audio using the following adjustments:
// Some non-zero value to get AVAudioEngine running
let startDelay = 0.1
// The original audio file start time
let originalStartingFrame: AVAudioFramePosition = AVAudioFramePosition(playerNode.outputFormat(forBus: 0).sampleRate * startDelay)
// The output tap's first sample is delivered to the device after the buffer is filled once
// A number of zero samples equal to the buffer size is produced initially
let outputStartingFrame: AVAudioFramePosition = Int64(state.outputBufferSizeFrames)
// The first output sample makes it way back into the input tap after accounting for all the latencies
let inputStartingFrame: AVAudioFramePosition = outputStartingFrame - Int64(state.outputLatency + state.outputStreamLatency + state.outputSafetyOffset + state.inputSafetyOffset + state.inputLatency + state.inputStreamLatency)
On my Mac the values reported by the AVAudioEngine aggregate device were:
// Output:
// kAudioDevicePropertySafetyOffset: 144
// kAudioDevicePropertyLatency: 11
// kAudioStreamPropertyLatency: 424
// kAudioDevicePropertyBufferFrameSize: 512
// Input:
// kAudioDevicePropertySafetyOffset: 154
// kAudioDevicePropertyLatency: 0
// kAudioStreamPropertyLatency: 2404
// kAudioDevicePropertyBufferFrameSize: 512
which equated to the following offsets:
originalStartingFrame = 4410
outputStartingFrame = 512
inputStartingFrame = -2625
I may not be able to answer your question, but I believe there is a property not mentioned in your question that does report additional latency information.
I've only worked at the HAL/AUHAL layers (never AVAudioEngine), but in discussions about computing the overall latencies, some audio device/stream properties come up: kAudioDevicePropertyLatency and kAudioStreamPropertyLatency.
Poking around a bit, I see those properties mentioned in the documentation for AVAudioIONode's presentationLatency property (https://developer.apple.com/documentation/avfoundation/avaudioionode/1385631-presentationlatency). I expect that the hardware latency reported by the driver will be there. (I suspect that the standard latency property reports latency for an input sample to appear in the output of a "normal" node, and IO case is special)
It's not in the context of AVAudioEngine, but here's one message from the CoreAudio mailing list that talks a bit about using the low level properties that may provide some additional background: https://lists.apple.com/archives/coreaudio-api/2017/Jul/msg00035.html
I'm writing an application where I should play parts of audio files. Each audio file contains audio data for a separate track.
These parts are sections with a begin time and a end time, and I'm trying to play those parts in the order I choose.
So for example, imagine I have 4 sections :
A - B - C - D
and I activate B and D, I want to play, B, then D, then B again, then D, etc..
To make smooth 'jumps" in playback I think it's important to fade in/out start/end sections buffers.
So, I have a basic AVAudioEngine setup, with AVAudioPlayerNode, and a mixer.
For each audio section, I cache some information :
a buffer for the first samples in the section (which I fade in manually)
a tuple for the AVAudioFramePosition, and AVAudioFrameCount of a middle segment
a buffer for the end samples in the audio section (which I fade out manually)
now, when I schedule a section for playing, I say the AVAudioPlayerNode :
schedule the start buffer (scheduleBuffer(_:completionHandler:) no option)
schedule the middle segment (scheduleSegment(_:startingFrame:frameCount:at:completionHandler:))
finally schedule the end buffer (scheduleBuffer(_:completionHandler:) no option)
all at "time" nil.
The problem here is I can hear clic, and crappy sounds at audio sections boundaries and I can't see where I'm doing wrong.
My first idea was the fades I do manually (basically multiplying sample values by a volume factor), but same result without doing that.
I thought I didn't schedule in time, but scheduling sections in advance, A - B - C for example beforehand has the same result.
I then tried different frame position computations, with audio format settings, same result.
So I'm out of ideas here, and perhaps I didn't get the schedule mechanism right.
Can anyone confirm I can mix scheduling buffers and segments in AVAudioPlayerNode ? or should I schedule only buffers or segments ?
I can confirm that scheduling only segments works, playback is perfectly fine.
A little context on how I cache information for audio sections..
In the code below, file is of type AVAudioFile loaded on disk from a URL, begin and end are TimeInterval values, and represent the start/end of my audio section.
let format = file.processingFormat
let startBufferFrameCount: AVAudioFrameCount = 4096
let endBufferFrameCount: AVAudioFrameCount = 4096
let audioSectionStartFrame = framePosition(at: begin, format: format)
let audioSectionEndFrame = framePosition(at: end, format: format)
let segmentStartFrame = audioSectionStartFrame + AVAudioFramePosition(startBufferFrameCount)
let segmentEndFrame = audioSectionEndFrame - AVAudioFramePosition(endBufferFrameCount)
startBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: startBufferFrameCount)
endBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: endBufferFrameCount)
file.framePosition = audioSectionStartFrame
try file.read(into: startBuffer)
file.framePosition = segmentEndFrame
try file.read(into: endBuffer)
middleSegment = (segmentStartFrame, AVAudioFrameCount(segmentEndFrame - segmentStartFrame))
frameCount = AVAudioFrameCount(audioSectionEndFrame - audioSectionStartFrame)
Also, the framePosition(at:format:) multiplies the TimeInterval value by the sample rate of the AVAudioFormat passed in.
I cache this information for every audio section, but I hear clicks at section boundaries, no matter if I schedule them in advance or not.
I also tried not mixing buffer and segments when scheduling, but I doesn't change anything, so I start thinking I'm doing wrong frame computations.
The one who solves this has to have the Sherlock Holmes trophy. Here it goes.
I'm using AudioQueues to record sound (LPCM, SInt16, 4 buffers) In the callback, I tried measuring the mean amplitude by converting the samples to float and using vDSP_meamgv. Here are some example means:
Mean, No of samples
44.400364, 44100
36.077393, 44100
27.672422, 41984
2889.821289, 44100
57.481972, 44100
58.967506, 42872
54.691631, 44100
2894.467285, 44100
62.697800, 42872
63.732948, 44100
66.575623, 44100
2979.566406, 42872
As you can see, every fourth (last) buffer is wild. I looked at the separate samples, there are lots of 0's and lots of huge numbers, and no normal numbers, like for the other buffers. Things get more interesting. If I use 3 buffers instead, the third one (always the last) is a bogey. And this holds for any number of buffers I choose.
I put an if in the callback to not enqueue the wild buffers, and once it's gone, there are no more huge numbers, the other buffers continue to fill normally. I put in a button that reenqueues this queue after it is being dropped, and once I reenqueue it, it again gets filled with gigantic samples (namely that buffer!)
And now the cherry - I put my code to calculate the mean in other projects, like SpeakHere from Apple, and the same thing happens there o.O , although the app works fine, recording and playing back what was recorded.
I just don't get it, I've cracked my brain trying to figure this one out. If somebody would have a clue...
Here's the callback, if it helps:
void Recorder::MyInputBufferHandler(void * inUserData,
AudioQueueRef inAQ,
AudioQueueBufferRef inBuffer,
const AudioTimeStamp * inStartTime,
UInt32 inNumPackets,
const AudioStreamPacketDescription* inPacketDesc) {
Recorder* eu = (Recorder*)inUserData;
vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, eu->conveier, 1, inBuffer->mAudioDataByteSize);
float mean;
vDSP_meamgv(eu->conveier, 1, &mean, inBuffer->mAudioDataByteSize);
printf("values: %f, %d\n",mean,inBuffer->mAudioDataByteSize);
// if (mean<2300)
AudioQueueEnqueueBuffer(inAQ, inBuffer, 0, NULL);
}
'conveier' is a float array I've preallocated.
It's also me that gets the trophy. The error was that the vDSP functions shouldn't have got the mAudioDataByteSize parameter, because they need the number of ELEMENTS in the array. In my case each element (SInt16) has 2 bytes, so I should have passed mAudioDataByteSize / 2. When it read the last buffer, it fell off the edge by another length and counted some random data. Voila! Very basic mistake, but when you look in all the wrong places, it doesn't appear so.
For anybody that stepped on the same rake...
PS. It came to me while taking a bath :)
I've specified and instantiated two Audio Units: a multichannel mixer unit and a generator of subtype AudioFilePlayer.
I would have thought I needed to set the ASBD of the filePlayer's output to match the ASBD I set for the mixer input. However when I attempt to set the filePlayer's output I get a kAudioUnitErr_FormatNotSupported (-10868) error.
Here's the stream format I set on the mixer input (successfully) and am also trying to set on the filePlayer (it's the monostream format copied from Apple's mixerhost sample project):
Sample Rate: 44100
Format ID: lpcm
Format Flags: C
Bytes per Packet: 2
Frames per Packet: 1
Bytes per Frame: 2
Channels per Frame: 1
Bits per Channel: 16
In the course of troubleshooting this I queried the filePlayer AU for the format it is 'natively' set to. This is what's returned:
Sample Rate: 44100
Format ID: lpcm
Format Flags: 29
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
All the example code I've found sends the output of the filePlayer unit to an effect unit and set the filePlayer's output to match the ASBD set for the effect unit. Given I have no effect unit it seems like setting the filePlayer's output to the mixer input's ASBD would be the correct - and required - thing to do.
How have you configured the AUGraph? I might need to see some code to help you out.
Setting the output scope of AUMultiChannelMixer ASBD once only (as in MixerHost) works. However if you have any kind of effect at all, you will need to think about where their ASBDs are defined and how you arrange your code so CoreAudio does not jump in and mess with your effects AudioUnits ASBDs. By messing with I mean overriding your ASBD to the default kAudioFormatFlagIsFloat, kAudioFormatFlagIsPacked, 2 channels, non-interleaved. This was a big pain for me at first.
I would set the effects AudioUnits to their default ASBD. Assuming you have connected the AUFilePlayer node, then you can pull it out later in the program like this
result = AUGraphNodeInfo (processingGraph,
filePlayerNode,
NULL,
&filePlayerUnit);
And then proceed to set
AudioUnitSetProperty(filePlayerUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
0,
&monoStreamFormat,
sizeof(monoStreamFormat));
Hopefully this helps.
Basically I didn't bother setting the filePlayer ASBD but rather retrieved the 'native' ASBD it was set to and updated only the sample rate and channel count.
Likewise I didn't set input on the mixer and let the mixer figure it's format out.