Varispeed with Libsndfile, Libsamplerate and Portaudio in C - portaudio

I'm working on an audio visualizer in C with OpenGL, Libsamplerate, portaudio, and libsndfile. I'm having difficulty using src_process correctly within my whole paradigm. My goal is to use src_process to achieve Vinyl Like varispeed in real time within the visualizer. Right now my implementation changes the pitch of the audio without changing the speed. It does so with lots of distortion due to what sounds like missing frames as when I lower the speed with the src_ratio it almost sounds granular like chopped up samples. Any help would be appreciated, I keep experimenting with my buffering chunks however 9 times out of 10 I get a libsamplerate error saying my input and output arrays are overlapping. I've also been looking at the speed change example that came with libsamplerate and I can't find where I went wrong. Any help would be appreciated.
Here's the code I believe is relevant. Thanks and let me know if I can be more specific, this semester was my first experience in C and programming.
#define FRAMES_PER_BUFFER 1024
#define ITEMS_PER_BUFFER (FRAMES_PER_BUFFER * 2)
float src_inBuffer[ITEMS_PER_BUFFER];
float src_outBuffer[ITEMS_PER_BUFFER];
void initialize_SRC_DATA()
{
data.src_ratio = 1; //Sets Default Playback Speed
/*---------------*/
data.src_data.data_in = data.src_inBuffer; //Point to SRC inBuffer
data.src_data.data_out = data.src_outBuffer; //Point to SRC OutBuffer
data.src_data.input_frames = 0; //Start with Zero to Force Load
data.src_data.output_frames = ITEMS_PER_BUFFER
/ data.sfinfo1.channels; //Number of Frames to Write Out
data.src_data.src_ratio = data.src_ratio; //Sets Default Playback Speed
}
/* Open audio stream */
err = Pa_OpenStream( &g_stream,
NULL,
&outputParameters,
data.sfinfo1.samplerate,
FRAMES_PER_BUFFER,
paNoFlag,
paCallback,
&data );
/* Read FramesPerBuffer Amount of Data from inFile into buffer[] */
numberOfFrames = sf_readf_float(data->inFile, data->src_inBuffer, framesPerBuffer);
/* Looping of inFile if EOF is Reached */
if (numberOfFrames < framesPerBuffer)
{
sf_seek(data->inFile, 0, SEEK_SET);
numberOfFrames = sf_readf_float(data->inFile,
data->src_inBuffer+(numberOfFrames*data->sfinfo1.channels),
framesPerBuffer-numberOfFrames);
}
/* Inform SRC Data How Many Input Frames To Process */
data->src_data.end_of_input = 0;
data->src_data.input_frames = numberOfFrames;
/* Perform SRC Modulation, Processed Samples are in src_outBuffer[] */
if ((data->src_error = src_process (data->src_state, &data->src_data))) {
printf ("\nError : %s\n\n", src_strerror (data->src_error)) ;
exit (1);
}
* Write Processed SRC Data to Audio Out and Visual Out */
for (i = 0; i < framesPerBuffer * data->sfinfo1.channels; i++)
{
// gl_audioBuffer[i] = data->src_outBuffer[i] * data->amplitude;
out[i] = data->src_outBuffer[i] * data->amplitude;
}

I figured out a solution that works well enough for me and am just going to explain it best I can for anyone else with a similar issue. So to get the Varispeed to work, the way the API works is you give it a certain number of frames, and it spits out a certain number of frames. So for a SRC ratio of 0.5, if you process 512 frames per loop you are feeding in 512/0.5 frames = 1024 frames. That way when the API runs its src_process function, it compresses those 1024 frames into 512, speeding up the samples. So I dont fully understand why it solved my issue, but the problem was if the ratio is say 0.7, you end up with a float number which doesn't work with the arrays indexed int values. Therefore there's missing samples unless the src ratio is eqaully divisble by the framesperbuffer potentially at the end of each block. So what I did was add +2 frames to be read if the framesperbuffer%src.ratio != 0 and it seemed to fix 99% of the glitches.
/* This if Statement Ensures Smooth VariSpeed Output */
if (fmod((double)framesPerBuffer, data->src_data.src_ratio) == 0)
{
numInFrames = framesPerBuffer;
}
else
numInFrames = (framesPerBuffer/data->src_data.src_ratio) + 2;
/* Read FramesPerBuffer Amount of Data from inFile into buffer[] */
numberOfFrames = sf_readf_float(data->inFile, data->src_inBuffer, numInFrames);

Related

STM32 - Reading I2S to record a .WAV file. Audio choppy, what is causing it?

I'm using an STM32 (STM32F446RE) to receive audio from two INMP441 mems microphone in an stereo setup via I2S protocol and record it into a .WAV on a micro SD card, using the HAL library.
I wrote the firmware that records audio into a .WAV with FreeRTOS. But the audio files that I record sound like Darth Vader. Here is a screenshot of the audio in audacity:
if you zoom in you can see a constant noise being inserted in between the real audio data:
I don't know what is causing this.
I have tried increasing the MessageQueue, but that doesnt seem to be the problem, the queue is kept at 0 most of the time. I've tried different frame sizes and sampling rates, changing the number of channels, using only one inmp441. All this without any success.
I proceed explaining the firmware.
Here is a block diagram of the architecture for the RTOS that I have implemented:
It consists of three tasks. The first one receives a command via UART (with interrupts) that signals to start or stop recording. the second one is simply an state machine that walks through the steps to write a .WAV.
Here the code for the WriteWavFileTask:
switch(audio_state)
{
case STATE_START_RECORDING:
sprintf(filename, "%saud_%03d.wav", SDPath, count++);
do
{
res = f_open(&file_ptr, filename, FA_CREATE_ALWAYS|FA_WRITE);
}
while(res != FR_OK);
res = fwrite_wav_header(&file_ptr, I2S_SAMPLE_FREQUENCY, I2S_FRAME, 2);
HAL_I2S_Receive_DMA(&hi2s2, aud_buf, READ_SIZE);
audio_state = STATE_RECORDING;
break;
case STATE_RECORDING:
osDelay(50);
break;
case STATE_STOP:
HAL_I2S_DMAStop(&hi2s2);
while(osMessageQueueGetCount(AudioQueueHandle)) osDelay(1000);
filesize = f_size(&file_ptr);
data_len = filesize - 44;
total_len = filesize - 8;
f_lseek(&file_ptr, 4);
f_write(&file_ptr, (uint8_t*)&total_len, 4, bw);
f_lseek(&file_ptr, 40);
f_write(&file_ptr, (uint8_t*)&data_len, 4, bw);
f_close(&file_ptr);
audio_state = STATE_IDLE;
break;
case STATE_IDLE:
osThreadSuspend(WAVHandle);
audio_state = STATE_START_RECORDING;
break;
default:
osDelay(50);
break;
Here are the macros used in the code for readability:
#define I2S_DATA_WORD_LENGTH (24) // industry-standard 24-bit I2S
#define I2S_FRAME (32) // bits per sample
#define READ_SIZE (128) // samples to read from I2S
#define WRITE_SIZE (READ_SIZE*I2S_FRAME/16) // half words to write
#define WRITE_SIZE_BYTES (WRITE_SIZE*2) // bytes to write
#define I2S_SAMPLE_FREQUENCY (16000) // sample frequency
The last task is the responsible for processing the buffer received via I2S. Here is the code:
void convert_endianness(uint32_t *array, uint16_t Size) {
for (int i = 0; i < Size; i++) {
array[i] = __REV(array[i]);
}
}
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
convert_endianness((uint32_t *)aud_buf, READ_SIZE);
osMessageQueuePut(AudioQueueHandle, aud_buf, 0L, 0);
HAL_I2S_Receive_DMA(hi2s, aud_buf, READ_SIZE);
}
void pvrWriteAudioTask(void *argument)
{
/* USER CODE BEGIN pvrWriteAudioTask */
static UINT *bw;
static uint16_t aud_ptr[WRITE_SIZE];
/* Infinite loop */
for(;;)
{
osMessageQueueGet(AudioQueueHandle, aud_ptr, 0L, osWaitForever);
res = f_write(&file_ptr, aud_ptr, WRITE_SIZE_BYTES, bw);
}
/* USER CODE END pvrWriteAudioTask */
}
This tasks reads from a queue an array of 256 uint16_t elements containing the raw audio data in PCM. f_write takes the Size parameter in number of bytes to write to the SD card, so 512 bytes. The I2S Receives 128 frames (for a 32 bit frame, 128 words).
The following is the configuration for the I2S and clocks:
Any help would be much appreciated!
Solution
As pmacfarlane pointed out, the problem was with the method used for buffering the audio data. The solution consisted of easing the overhead on the ISR and implementing a circular DMA for double buffering. Here is the code:
#define I2S_DATA_WORD_LENGTH (24) // industry-standard 24-bit I2S
#define I2S_FRAME (32) // bits per sample
#define READ_SIZE (128) // samples to read from I2S
#define BUFFER_SIZE (READ_SIZE*I2S_FRAME/16) // number of uint16_t elements expected
#define WRITE_SIZE_BYTES (BUFFER_SIZE*2) // bytes to write
#define I2S_SAMPLE_FREQUENCY (16000) // sample frequency
uint16_t aud_buf[2*BUFFER_SIZE]; // Double buffering
static volatile int16_t *BufPtr;
void convert_endianness(uint32_t *array, uint16_t Size) {
for (int i = 0; i < Size; i++) {
array[i] = __REV(array[i]);
}
}
void HAL_I2S_RxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
{
BufPtr = aud_buf;
osSemaphoreRelease(RxAudioSemHandle);
}
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
BufPtr = &aud_buf[BUFFER_SIZE];
osSemaphoreRelease(RxAudioSemHandle);
}
void pvrWriteAudioTask(void *argument)
{
/* USER CODE BEGIN pvrWriteAudioTask */
static UINT *bw;
/* Infinite loop */
for(;;)
{
osSemaphoreAcquire(RxAudioSemHandle, osWaitForever);
convert_endianness((uint32_t *)BufPtr, READ_SIZE);
res = f_write(&file_ptr, BufPtr, WRITE_SIZE_BYTES, bw);
}
/* USER CODE END pvrWriteAudioTask */
}
Problems
I think the problem is your method of buffering the audio data - mainly in this function:
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
convert_endianness((uint32_t *)aud_buf, READ_SIZE);
osMessageQueuePut(AudioQueueHandle, aud_buf, 0L, 0);
HAL_I2S_Receive_DMA(hi2s, aud_buf, READ_SIZE);
}
The main problem is that you are re-using the same buffer each time. You have queued a message to save aud_buf to the SD-card, but you've also instructed the I2S to start DMAing data into that same buffer, before it has been saved. You'll end up saving some kind of mish-mash of "old" data and "new" data.
#Flexz pointed out that the message queue takes a copy of the data, so there is no issue about the I2S writing over the data that is being written to the SD-card. However, taking the copy (in an ISR) adds overhead, and delays the start of the new I2S DMA.
Another problem is that you are doing the endian conversion in this function (that is called from an ISR). This will block any other (lower priority) interrupts from being serviced while this happens, which is a bad thing in an embedded system. You should do the endian conversion in the task that reads from the queue. ISRs should be very short and do the minimum possible work (often just setting a flag, giving a semaphore, or adding something to a queue).
Lastly, while you are doing the endian conversion, what is happening to audio samples? The previous DMA has completed, and you haven't started a new one, so they will just be dropped on the floor.
Possible solution
You probably want to allocate a suitably big buffer, and configure your DMA to work in circular buffer mode. This means that once started, the DMA will continue forever (until you stop it), so you'll never drop any samples. There won't be any gap between one DMA finishing and a new one starting, since you never need to start a new one.
The DMA provides a "half-complete" interrupt, to say when it has filled half the buffer. So start the DMA, and when you get the half-complete interrupt, queue up the first half of the buffer to be saved. When you get the fully-complete interrupt, queue up the second half of the buffer to be saved. Rinse and repeat.
You might want to add some logic to detect if the interrupt happens before the previous save has completed, since the data will be overrun and possibly corrupted. Depending on the speed of the SD-card (and the sample rate), this may or may not be a problem.

could NaN be causing the occasional crash in this core audio iOS app?

My first app synthesised music audio from a sine look-up table using methods deprecated since iOS 6. I have just revised it to address warnings about AudioSessionhelped by this blog and the Apple guidelines on AVFoundationFramework. Audio Session warnings have now been addressed and the app produces audio as it did before. It currently runs under iOS 9.
However the app occasionally crashes for no apparent reason. I checked out this SO post but it seems to deal with accessing rather than generating raw audio data, so maybe it is not dealing with a timing issue. I suspect there is a buffering problem but I need to understand what this might be before I change or fine tune anything in the code.
I have a deadline to make the revised app available to users so I'd be most grateful to hear from someone who has dealt a similar issue.
Here is the issue. The app goes into debug on the simulator reporting:
com.apple.coreaudio.AQClient (8):EXC_BAD_ACCESS (code=1, address=0xffffffff10626000)
In the Debug Navigator, Thread 8 (com.apple.coreaudio.AQClient (8)), it reports:
0 -[Synth fillBuffer:frames:]
1 -[PlayView audioBufferPlayer:fillBuffer:format:]
2 playCallback
This line of code in fillBuffer is highlighted
float sineValue = (1.0f - b)*sine[a] + b*sine[c];
... and so is this line of code in audioBufferPlayer
int packetsWritten = [synth fillBuffer:buffer->mAudioData frames:packetsPerBuffer];
... and playCallBack
[player.delegate audioBufferPlayer:player fillBuffer:inBuffer format:player.audioFormat];
Here is the code for audioBufferPlayer (delegate, essentially the same as in the demo referred to above).
- (void)audioBufferPlayer:(AudioBufferPlayer*)audioBufferPlayer fillBuffer:(AudioQueueBufferRef)buffer format:(AudioStreamBasicDescription)audioFormat
{
[synthLock lock];
int packetsPerBuffer = buffer->mAudioDataBytesCapacity / audioFormat.mBytesPerPacket;
int packetsWritten = [synth fillBuffer:buffer->mAudioData frames:packetsPerBuffer];
buffer->mAudioDataByteSize = packetsWritten * audioFormat.mBytesPerPacket;
[synthLock unlock];
}
... (initialised in myViewController)
- (id)init
{
if ((self = [super init])) {
// The audio buffer is managed (filled up etc.) within its own thread (Audio Queue thread)
// Since we are also responding to changes from the GUI, we need a lock so both threads
// do not attempt to change the same value independently.
synthLock = [[NSLock alloc] init];
// Synth and the AudioBufferPlayer must use the same sample rate.
float sampleRate = 44100.0f;
// Initialise synth to fill the audio buffer with audio samples.
synth = [[Synth alloc] initWithSampleRate:sampleRate];
// Initialise note buttons
buttons = [[NSMutableArray alloc] init];
// Initialise the audio buffer.
player = [[AudioBufferPlayer alloc] initWithSampleRate:sampleRate channels:1 bitsPerChannel:16 packetsPerBuffer:1024];
player.delegate = self;
player.gain = 0.9f;
[[AVAudioSession sharedInstance] setActive:YES error:nil];
}
return self;
} // initialisation
... and for playCallback
static void playCallback( void* inUserData, AudioQueueRef inAudioQueue, AudioQueueBufferRef inBuffer)
{
AudioBufferPlayer* player = (AudioBufferPlayer*) inUserData;
if (player.playing){
[player.delegate audioBufferPlayer:player fillBuffer:inBuffer format:player.audioFormat];
AudioQueueEnqueueBuffer(inAudioQueue, inBuffer, 0, NULL);
}
}
... and here is the code for fillBuffer where audio is synthesised
- (int)fillBuffer:(void*)buffer frames:(int)frames
{
SInt16* p = (SInt16*)buffer;
// Loop through the frames (or "block size"), then consider each sample for each tone.
for (int f = 0; f < frames; ++f)
{
float m = 0.0f; // the mixed value for this frame
for (int n = 0; n < MAX_TONE_EVENTS; ++n)
{
if (tones[n].state == STATE_INACTIVE) // only active tones
continue;
// recalculate a 30sec envelope and place in a look-up table
// Longer notes need to interpolate through the envelope
int a = (int)tones[n].envStep; // integer part (like a floored float)
float b = tones[n].envStep - a; // decimal part (like doing a modulo)
// c allows us to calculate if we need to wrap around
int c = a + 1; // (like a ceiling of integer part)
if (c >= envLength) c = a; // don't wrap around
/////////////// LOOK UP ENVELOPE TABLE /////////////////
// uses table look-up with interpolation for both level and pitch envelopes
// 'b' is a value interpolated between 2 successive samples 'a' and 'c')
// first, read values for the level envelope
float envValue = (1.0f - b)*tones[n].levelEnvelope[a] + b*tones[n].levelEnvelope[c];
// then the pitch envelope
float pitchFactorValue = (1.0f - b)*tones[n].pitchEnvelope[a] + b*tones[n].pitchEnvelope[c];
// Advance envelope pointer one step
tones[n].envStep += tones[n].envDelta;
// Turn note off at the end of the envelope.
if (((int)tones[n].envStep) >= envLength){
tones[n].state = STATE_INACTIVE;
continue;
}
// Precalculated Sine look-up table
a = (int)tones[n].phase; // integer part
b = tones[n].phase - a; // decimal part
c = a + 1;
if (c >= sineLength) c -= sineLength; // wrap around
///////////////// LOOK UP OF SINE TABLE ///////////////////
float sineValue = (1.0f - b)*sine[a] + b*sine[c];
// Wrap round when we get to the end of the sine look-up table.
tones[n].phase += (tones[n].frequency * pitchFactorValue); // calculate frequency for each point in the pitch envelope
if (((int)tones[n].phase) >= sineLength)
tones[n].phase -= sineLength;
////////////////// RAMP NOTE OFF IF IT HAS BEEN UNPRESSED
if (tones[n].state == STATE_UNPRESSED) {
tones[n].gain -= 0.0001;
if ( tones[n].gain <= 0 ) {
tones[n].state = STATE_INACTIVE;
}
}
//////////////// FINAL SAMPLE VALUE ///////////////////
float s = sineValue * envValue * gain * tones[n].gain;
// Clip the signal, if needed.
if (s > 1.0f) s = 1.0f;
else if (s < -1.0f) s = -1.0f;
// Add the sample to the out-going signal
m += s;
}
// Write the sample mix to the buffer as a 16-bit word.
p[f] = (SInt16)(m * 0x7FFF);
}
return frames;
}
I'm not sure whether it is a red herring but I came across NaN in several debug registers. It appears to happen while calculating phase increment for sine lookup in fillBuffer (see above). That calculation is done for up to a dozen partials every sample at a sampling rate of 44.1 kHz and worked in iOS 4 on an iPhone 4. I'm running on simulator of iOS 9. The only changes I made are described in this post!
My NaN problem turned out to have nothing directly to do with Core Audio. It was caused by an edge condition introduced by changes in another area of my code. The real problem was a division by zero attempted while calculating the duration of the sound envelope in realtime.
However, in trying to identify the cause of that problem, I am confident my pre-iOS 7 Audio Session has been replaced by a working setup based on AVFoundation. Thanks goes to the source of my initial code Matthijs Hollemans and also to Mario Diana whose blog explained the changes needed.
At first, the sound levels on my iPhone were significantly less than the sound levels on the Simulator, a problem addressed here by foundry. I found it necessary to include these improvements by replacing Mario's
- (BOOL)setUpAudioSession
with foundry's
- (void)configureAVAudioSession
Hopefully this might help someone else.

How do I increase the size of EZAudio EZMicrophone?

I would like to use the EZAudio framework to do realtime microphone signal FFT processing, along with some other processing in order to determine the peak frequency.
The problem is, the EZmicrophone class only appears to work on 512 samples, however, my signal requires an FFT of 8192 or even 16384 samples. There doesnt appear to be a way to change the buffer size in EZMicrophone, but I've read posts that recommend creating an array of my target size and appending the microphone buffer to it, then when it's full, do the FFT.
When I do this though, I get large chunks of memory with no data, or discontinuities between the segments of copied memory. I think it may have something to do with the timing or order in which the microphone delegate is being called or memory being overwritten in different threads...I'm grasping at straws here. Am I correct in assuming that this code is being executed everytime the microphone buffer is full of a new 512 samples?
Can anyone suggest what I may be doing wrong? I've been stuck on this for a long time.
Here is the post I've been using as a reference:
EZAudio: How do you separate the buffersize from the FFT window size(desire higher frequency bin resolution).
// Global variables which are bad but I'm just trying to make things work
float tempBuf[512];
float fftBuf[8192];
int samplesRemaining = 8192;
int samplestoCopy = 512;
int FFTLEN = 8192;
int fftBufIndex = 0;
#pragma mark - EZMicrophoneDelegate
-(void) microphone:(EZMicrophone *)microphone
hasAudioReceived:(float **)buffer
withBufferSize:(UInt32)bufferSize
withNumberOfChannels:(UInt32)numberOfChannels {
// Copy the microphone buffer so it wont be changed
memcpy(tempBuf, buffer[0], bufferSize);
dispatch_async(dispatch_get_main_queue(),^{
// Setup the FFT if it's not already setup
if( !_isFFTSetup ){
[self createFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
_isFFTSetup = YES;
}
int samplesRemaining = FFTLEN;
memcpy(fftBuf+fftBufIndex, tempBuf, samplestoCopy*sizeof(float));
fftBufIndex += samplestoCopy;
samplesRemaining -= samplestoCopy;
if (fftBufIndex == FFTLEN)
{
fftBufIndex = 0;
samplesRemaining = FFTLEN;
[self updateFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
}
});
}
You likely have threading issues because you are trying to do work in some blocks that takes much much longer than the time between audio callbacks. Your code is being called repeatedly before prior calls can say that they are done (with the FFT setup or clearing the FFT buffer).
Try doing the FFT setup outside the callback before starting the recording, only copy to a circular buffer or FIFO inside the callback, and do the FFT in code async to the callback (not locked in the same block as the circular buffer copy).

How do I interpret an AudioBuffer and get the power?

I am trying to make a volume-meter for my app, which will show while recording a video. I have found a lot of support for such meters for iOS, but mostly for AVAudioPlayer, which is no option for me. I am using AVCaptureSession to record, and will then end up with the delegate method shown below:
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);
CFRetain(sampleBuffer);
CFRetain(formatDescription);
if(connection == audioConnection)
{
CMBlockBufferRef blockBuffer;
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer,
NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&blockBuffer);
SInt16 *data = audioBufferList.mBuffers[0].mData;
}
//Releases etc..
}
(Only showing relevant code)
Of what I understand, I receive a 'sample buffer', containing either audio or video. Once I've verified that the connection indeed is audio, then I 'extract' the audioBufferList from the buffer, and I am sitting here left with a list of one (or more?) audioBuffers. The actual data is, as I understand, represented as SInt16, or '16 bits signed integer', which as far as I understand has a range from -32,768 to 32,767. However, if I simply print out this received value, I get A LOT of bouncing numbers. When in "silence" I get values bouncing rapidly between -200 and 200, and when there's noise I get values from -4,000 to 13,000, completely out of order.
As I've understood from reading, the value 0 will represent silence. However, I do not understand the difference between negative and positive values, as well as I do not know if the are able to reach all the way up/down to +-32,768.
I believe I need a percentage of how 'loud' it is, but have been unable to find anything.
I have read a couple of tutorials and references on the matter, but nothing makes sense to me. I followed one guide by doing this(appending to the code above, inside the if):
float accumulator = 0;
for(int i = 0; i < audioBufferList.mBuffers[0].mDataByteSize; i++)
accumulator += data[i] * data[i];
float power = accumulator / audioBufferList.mBuffers[0].mDataByteSize;
float decibels = log10f(power);
NSLog(#"%f", decibels);
Apparently, this code was supposed to align from -1 to +1, but that did not happen. I am now getting values around 6.194681 when silence, and 7.773492 for some noise. This is feels like the correct 'range', but in the 'wrong place'. I can't simply subtract 7 from the number and assume I'm between -1 and +1. There should be some logic and science behind how this should work, but I do not know enough about how digital audio works.
Does anyone know the logic behind this? Is 0 always silence while -32,768 and 32,767 are loud noises? Can I then simply multiply all negative values by -1 to always get positive values, and then find out how many percent they are at (between 0 and 32767)? Somehow, I don't believe this will work, as I guess there is a reason for the negative values.. I'm not completely sure what to try.
The code in your question is wrong in several ways. This code is trying to copy that from the article below, but you've not handled it properly converting from the float-based code in the article to 16-bit integer math. You're also looping on the wrong number of values (max i) and will end up pulling in garbage data. So this is all kinds of wrong.
https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html
The code in the article is correct. Here's what it is, expanded a bit. This is only looking at the first buffer in a 32-bit float buffer list.
float accumulator = 0;
AudioBuffer buffer = bufferList->mBuffers[0];
float * data = (float *)buffer.mData;
UInt32 numSamples = buffer.mDataByteSize / sizeof(float);
for (UInt32 i = 0; i < numSamples; i++) {
accumulator += data[i] * data[i];
}
float power = accumulator / (float)numSamples;
float decibels = 10 * log10f(power);
As the article says, the result here is decibels uses 0dB reference. eg, 0.0 is the maximum value. This is the same thing that AVAudioPlayer's averagePowerForChannel returns for example.
To use this in your 16-bit integer context, you'd need to a) loop appropriately through each 16-bit sample, b) convert the data[i] value from a 16-bit integer to a floating point value in the [-1.0, 1.0] range before squaring and adding to the accumulator.

ios audio queue - how to meter audio level in buffer?

I'm working on an app that should do some audio signal processing. I need to measure the audio level in each one of the buffers I get (through the Callback function). I've been searching the web for some time, and I found that there is a build-in property called Current level metering:
AudioQueueGetProperty(recordState->queue,kAudioQueueProperty_CurrentLevelMeter,meters,&dlen);
This property gets me the average or peak audio level, but it's not synchronised to the current buffer.
I figured out I need to calculate the audio level from the buffer data by myself, so I had this:
double calcAudioRMS (SInt16 * audioData, int numOfSamples)
{
double RMS, adPercent;
RMS = 0;
for (int i=0; i<numOfSamples; i++)
{
adPercent=audioData[i]/32768.0f;
RMS += adPercent*adPercent;
}
RMS = sqrt(RMS / numOfSamples);
return RMS;
}
This function gets the audio data (casted into Sint16) and the number of samples in the current buffer. The numbers I get are indeed between 0 and 1, but they seem to be rather random and low comparing to the numbers I got from the built-in audio level metering.
The recording audio format is:
format->mSampleRate = 8000.0;
format->mFormatID = kAudioFormatLinearPCM;
format->mFramesPerPacket = 1;
format->mChannelsPerFrame = 1;
format->mBytesPerFrame = 2;
format->mBytesPerPacket = 2;
format->mBitsPerChannel = 16;
format->mReserved = 0;
format->mFormatFlags = kLinearPCMFormatFlagIsSignedInteger |kLinearPCMFormatFlagIsPacked;
My question is how to get the right values from the buffer? Is there a built-in function \ property for this? Or should I calculate the audio level myself, and how to do it?
Thanks in advance.
Your calculation for RMS power is correct. I'd be inclined to say that you have a fewer number of samples than Apple does, or something similar, and that would explain the difference. You can check by inputting a loud sine wave, and checking that Apple (and you) calculate RMS power at 1/sqrt(2).
Unless there's a good reason, I would use Apple's power calculations. I've used them, and they seem good to me. Additionally, generally you don't want RMS power, you want RMS power as decibels, or use the kAudioQueueProperty_CurrentLevelMeterDB constant. (This depends on if you're trying to build an audio meter, or truly display the audio power)

Resources