DXGI Waitable SwapChain not waiting - directx

I setup a DX12 application that only clears the backbuffer every frame.
It really is barebone : no PSO, no root...
The only particularity is that it waits on the swapChain to be done with Present() before starting a new frame (msdn waitable swap chain) (I set the frame latency to 1 as well and on only have 2 buffers).
The first frame works well but it immediately starts drawing the second frame, and of course, the command allocator complains that it is being reset while commands are still being executed on the GPU.
I could of course setup a fence to wait for the gpu to be done before moving to a new frame, but I thought this was the job of the waitable swap chain object.
Here is the render routine :
if (m_command_allocator->Reset() == E_FAIL) { throw; }
HRESULT res = S_OK;
res = m_command_list->Reset(m_command_allocator.Get(), nullptr);
if (res == E_FAIL || res == E_OUTOFMEMORY) { throw; }
m_command_list->ResourceBarrier(1,
&CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(),
D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET));
m_command_list->RSSetViewports(1, &m_screen_viewport);
m_command_list->RSSetScissorRects(1, &m_scissor_rect);
m_command_list->ClearRenderTargetView(get_rtv_handle(),
DirectX::Colors::BlueViolet, 0, nullptr);
m_command_list->OMSetRenderTargets(1, &get_rtv_handle(), true, nullptr);
m_command_list->ResourceBarrier(1,
&CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(),
D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT));
tools::throw_if_failed(m_command_list->Close());
ID3D12CommandList* ppCommandLists[] = { m_command_list.Get() };
m_command_queue->ExecuteCommandLists(_countof(ppCommandLists),
ppCommandLists);
if (m_swap_chain->Present(1, 0) != S_OK) { throw; }
m_frame_index = m_swap_chain->GetCurrentBackBufferIndex();
I loop on this routine with a waitable object wich I got from the swapchain :
while (WAIT_OBJECT_0 == WaitForSingleObjectEx(waitable_renderer, INFINITE, TRUE) && m_alive == true)
{
m_graphics.render();
}
and I initialized the swapchain with the waitable flag :
DXGI_SWAP_CHAIN_DESC1 swap_chain_desc = {};
swap_chain_desc.BufferCount = s_frame_count;
swap_chain_desc.Width = window_width;
swap_chain_desc.Height = window_height;
swap_chain_desc.Format = m_back_buffer_format;
swap_chain_desc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swap_chain_desc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
swap_chain_desc.SampleDesc.Count = 1;
swap_chain_desc.Flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
ComPtr<IDXGISwapChain1> swap_chain;
tools::throw_if_failed(
factory->CreateSwapChainForHwnd(m_command_queue.Get(), window_handle, &swap_chain_desc, nullptr, nullptr, &swap_chain));
I call the SetFrameLatency right after creating the swapChain :
ComPtr<IDXGISwapChain2> swap_chain2;
tools::throw_if_failed(m_swap_chain.As(&swap_chain2));
tools::throw_if_failed(swap_chain2->SetMaximumFrameLatency(1));
m_waitable_renderer = swap_chain2->GetFrameLatencyWaitableObject();
And the swapChain resize that goes with it :
tools::throw_if_failed(
m_swap_chain->ResizeBuffers(s_frame_count, window_width, window_height, m_back_buffer_format, DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT));
My question is : Am I setting something up incorrectly ? or is this the way waitable swap chain works (ie you also need to sync with gpu with fences before waiting for the swap chain to become available) ?
EDIT: Adding SetFrameLatency call + C++ coloring

The waitable swap chain is independent from the work of protecting d3d12 object to be modified or reset while still in use by the GPU.
A waitable swap chain allow you to move the wait from the end of the frame in Present to the start of the frame with a waitable object. It has the advantage to fight latency and give more control over queuing.
Fencing over object allow you to query the GPU for completion. I recommend you to not just cross fingers as if it works one day on one system, it may not work the next one with different driver or different machine.
Because you do not want each frame to wait for GPU completion, you have to create several command allocator, usually, create a count of min(maxlatency+1,swapchain buffer count) but for safety i use personally back buffer count + 1..3. You will later find that you are gonna create way more allocators to deal with multi-threading anyway.
What does it mean for your code :
Create several allocators in a ring buffer with an associated fence value
Create a fence ( and a global fence next value )
swap chain wait
pick next allocator
if fence.GetCompletedValue() < allocator.fenceValue then WaitCompletion
Render
signal the fence with the command queue, store the fence value to allocator and increment
Jump to 3

Related

STM32 - Reading I2S to record a .WAV file. Audio choppy, what is causing it?

I'm using an STM32 (STM32F446RE) to receive audio from two INMP441 mems microphone in an stereo setup via I2S protocol and record it into a .WAV on a micro SD card, using the HAL library.
I wrote the firmware that records audio into a .WAV with FreeRTOS. But the audio files that I record sound like Darth Vader. Here is a screenshot of the audio in audacity:
if you zoom in you can see a constant noise being inserted in between the real audio data:
I don't know what is causing this.
I have tried increasing the MessageQueue, but that doesnt seem to be the problem, the queue is kept at 0 most of the time. I've tried different frame sizes and sampling rates, changing the number of channels, using only one inmp441. All this without any success.
I proceed explaining the firmware.
Here is a block diagram of the architecture for the RTOS that I have implemented:
It consists of three tasks. The first one receives a command via UART (with interrupts) that signals to start or stop recording. the second one is simply an state machine that walks through the steps to write a .WAV.
Here the code for the WriteWavFileTask:
switch(audio_state)
{
case STATE_START_RECORDING:
sprintf(filename, "%saud_%03d.wav", SDPath, count++);
do
{
res = f_open(&file_ptr, filename, FA_CREATE_ALWAYS|FA_WRITE);
}
while(res != FR_OK);
res = fwrite_wav_header(&file_ptr, I2S_SAMPLE_FREQUENCY, I2S_FRAME, 2);
HAL_I2S_Receive_DMA(&hi2s2, aud_buf, READ_SIZE);
audio_state = STATE_RECORDING;
break;
case STATE_RECORDING:
osDelay(50);
break;
case STATE_STOP:
HAL_I2S_DMAStop(&hi2s2);
while(osMessageQueueGetCount(AudioQueueHandle)) osDelay(1000);
filesize = f_size(&file_ptr);
data_len = filesize - 44;
total_len = filesize - 8;
f_lseek(&file_ptr, 4);
f_write(&file_ptr, (uint8_t*)&total_len, 4, bw);
f_lseek(&file_ptr, 40);
f_write(&file_ptr, (uint8_t*)&data_len, 4, bw);
f_close(&file_ptr);
audio_state = STATE_IDLE;
break;
case STATE_IDLE:
osThreadSuspend(WAVHandle);
audio_state = STATE_START_RECORDING;
break;
default:
osDelay(50);
break;
Here are the macros used in the code for readability:
#define I2S_DATA_WORD_LENGTH (24) // industry-standard 24-bit I2S
#define I2S_FRAME (32) // bits per sample
#define READ_SIZE (128) // samples to read from I2S
#define WRITE_SIZE (READ_SIZE*I2S_FRAME/16) // half words to write
#define WRITE_SIZE_BYTES (WRITE_SIZE*2) // bytes to write
#define I2S_SAMPLE_FREQUENCY (16000) // sample frequency
The last task is the responsible for processing the buffer received via I2S. Here is the code:
void convert_endianness(uint32_t *array, uint16_t Size) {
for (int i = 0; i < Size; i++) {
array[i] = __REV(array[i]);
}
}
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
convert_endianness((uint32_t *)aud_buf, READ_SIZE);
osMessageQueuePut(AudioQueueHandle, aud_buf, 0L, 0);
HAL_I2S_Receive_DMA(hi2s, aud_buf, READ_SIZE);
}
void pvrWriteAudioTask(void *argument)
{
/* USER CODE BEGIN pvrWriteAudioTask */
static UINT *bw;
static uint16_t aud_ptr[WRITE_SIZE];
/* Infinite loop */
for(;;)
{
osMessageQueueGet(AudioQueueHandle, aud_ptr, 0L, osWaitForever);
res = f_write(&file_ptr, aud_ptr, WRITE_SIZE_BYTES, bw);
}
/* USER CODE END pvrWriteAudioTask */
}
This tasks reads from a queue an array of 256 uint16_t elements containing the raw audio data in PCM. f_write takes the Size parameter in number of bytes to write to the SD card, so 512 bytes. The I2S Receives 128 frames (for a 32 bit frame, 128 words).
The following is the configuration for the I2S and clocks:
Any help would be much appreciated!
Solution
As pmacfarlane pointed out, the problem was with the method used for buffering the audio data. The solution consisted of easing the overhead on the ISR and implementing a circular DMA for double buffering. Here is the code:
#define I2S_DATA_WORD_LENGTH (24) // industry-standard 24-bit I2S
#define I2S_FRAME (32) // bits per sample
#define READ_SIZE (128) // samples to read from I2S
#define BUFFER_SIZE (READ_SIZE*I2S_FRAME/16) // number of uint16_t elements expected
#define WRITE_SIZE_BYTES (BUFFER_SIZE*2) // bytes to write
#define I2S_SAMPLE_FREQUENCY (16000) // sample frequency
uint16_t aud_buf[2*BUFFER_SIZE]; // Double buffering
static volatile int16_t *BufPtr;
void convert_endianness(uint32_t *array, uint16_t Size) {
for (int i = 0; i < Size; i++) {
array[i] = __REV(array[i]);
}
}
void HAL_I2S_RxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
{
BufPtr = aud_buf;
osSemaphoreRelease(RxAudioSemHandle);
}
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
BufPtr = &aud_buf[BUFFER_SIZE];
osSemaphoreRelease(RxAudioSemHandle);
}
void pvrWriteAudioTask(void *argument)
{
/* USER CODE BEGIN pvrWriteAudioTask */
static UINT *bw;
/* Infinite loop */
for(;;)
{
osSemaphoreAcquire(RxAudioSemHandle, osWaitForever);
convert_endianness((uint32_t *)BufPtr, READ_SIZE);
res = f_write(&file_ptr, BufPtr, WRITE_SIZE_BYTES, bw);
}
/* USER CODE END pvrWriteAudioTask */
}
Problems
I think the problem is your method of buffering the audio data - mainly in this function:
void HAL_I2S_RxCpltCallback(I2S_HandleTypeDef *hi2s)
{
convert_endianness((uint32_t *)aud_buf, READ_SIZE);
osMessageQueuePut(AudioQueueHandle, aud_buf, 0L, 0);
HAL_I2S_Receive_DMA(hi2s, aud_buf, READ_SIZE);
}
The main problem is that you are re-using the same buffer each time. You have queued a message to save aud_buf to the SD-card, but you've also instructed the I2S to start DMAing data into that same buffer, before it has been saved. You'll end up saving some kind of mish-mash of "old" data and "new" data.
#Flexz pointed out that the message queue takes a copy of the data, so there is no issue about the I2S writing over the data that is being written to the SD-card. However, taking the copy (in an ISR) adds overhead, and delays the start of the new I2S DMA.
Another problem is that you are doing the endian conversion in this function (that is called from an ISR). This will block any other (lower priority) interrupts from being serviced while this happens, which is a bad thing in an embedded system. You should do the endian conversion in the task that reads from the queue. ISRs should be very short and do the minimum possible work (often just setting a flag, giving a semaphore, or adding something to a queue).
Lastly, while you are doing the endian conversion, what is happening to audio samples? The previous DMA has completed, and you haven't started a new one, so they will just be dropped on the floor.
Possible solution
You probably want to allocate a suitably big buffer, and configure your DMA to work in circular buffer mode. This means that once started, the DMA will continue forever (until you stop it), so you'll never drop any samples. There won't be any gap between one DMA finishing and a new one starting, since you never need to start a new one.
The DMA provides a "half-complete" interrupt, to say when it has filled half the buffer. So start the DMA, and when you get the half-complete interrupt, queue up the first half of the buffer to be saved. When you get the fully-complete interrupt, queue up the second half of the buffer to be saved. Rinse and repeat.
You might want to add some logic to detect if the interrupt happens before the previous save has completed, since the data will be overrun and possibly corrupted. Depending on the speed of the SD-card (and the sample rate), this may or may not be a problem.

How to create render target views on swap chain with multiple frame buffers in directx 11

I'm trying to implement a swap chain with more than 1 back buffer, but I'm having troubles creating render target views for any buffer after the zero-th.
I create my swap chain like so:
IDXGIFactory1* idxgiFactory;
// D3D_CALL is just a macro that throws exception with info on error
D3D_CALL(CreateDXGIFactory1(__uuidof(IDXGIFactory1), &idxgiFactory));
DXGI_SWAP_CHAIN_DESC sd;
ZeroMemory(&sd, sizeof(sd));
sd.BufferCount = BUFFER_COUNT; // currently 2
sd.BufferDesc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
sd.BufferDesc.RefreshRate.Numerator = 0;
sd.BufferDesc.RefreshRate.Denominator = 0;
sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | D3D11_BIND_RENDER_TARGET;
sd.OutputWindow = window.GetHandle(); // wrapper for my window
sd.SampleDesc.Count = 4;
sd.SampleDesc.Quality = 1;
sd.SwapEffect = DXGI_SWAP_EFFECT_SEQUENTIAL;
sd.Windowed = TRUE;
D3D_CALL(idxgiFactory->CreateSwapChain(
m_HWDevice, // ptr to ID3D11Device
&sd,
&m_HWSwapChain));
It was working till now with single frame buffer with swap effect DISCARD, which is outdated and performs poorly according to MSDN.
After I create the swap chain I get the backbuffers and create views like so:
// this is called with buffer index from 0 till BUFFER_COUNT - 1
// 'm_RenderTarget' is simply an array of ID3D11Texture2D, where the size matches BUFFER_COUNT
D3D_CALL(m_HWSwapChain->GetBuffer(bufferIndex, __uuidof(ID3D11Texture2D), (LPVOID*)&m_RenderTarget[bufferIndex]));
// I then attempt to create the RTV like so:
ID3D11RenderTargetView* rtv = NULL;
D3D_CALL(m_HWDevice->CreateRenderTargetView(m_RenderTarget[bufferIndex], NULL, &rtv));
The code about creating render target views works fine for 'bufferIndex' 0, but on index 1 I get following error:
D3D11 ERROR: ID3D11Device::CreateRenderTargetView: A render-target view cannot be made on a read-only resource. (Perhaps a DXGI_SWAP_CHAIN buffer other than buffer 0?) [ STATE_CREATION ERROR #135: CREATERENDERTARGETVIEW_INVALIDDESC]
I assume from this I have to use a D3D11_RENDER_TARGET_VIEW_DESC and fill the D3D11_BUFFER_RTV structure inside? No idea how to setup this though and could not find any examples.
I tried to create RTV with descriptor like so:
D3D11_RENDER_TARGET_VIEW_DESC rtvDesc;
ZeroMemory(&rtvDesc, sizeof(rtvDesc));
rtvDesc.Buffer.NumElements = BUFFER_COUNT;
rtvDesc.Buffer.FirstElement = 0;
rtvDesc.Buffer.ElementOffset = size.x * size.y * 4; // dimensions of my back buffer * 4 bytes per pixel
rtvDesc.Buffer.ElementWidth = 4; // 4 bytes per pixel in DXGI_FORMAT_B8G8R8A8_UNORM
This gives error:
D3D11 ERROR: ID3D11Device::CreateRenderTargetView: The ViewDimension in the View Desc incompatible with the type of the Resource. [ STATE_CREATION ERROR #129: CREATESHADERRESOURCEVIEW_INVALIDRESOURCE]
Not sure what I'm missing here.
I had a bit of a misunderstanding how this works. Some key pieces of information I was missing:
*
If the swap chain's swap effect is either DXGI_SWAP_EFFECT_SEQUENTIAL
or DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL, only the swap chain's zero-index
buffer can be read from and written to. The swap chain's buffers with
indexes greater than zero can only be read from; so if you call the
IDXGIResource::GetUsage method for such buffers, they have the
DXGI_USAGE_READ_ONLY flag set.
Source: https://learn.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgiswapchain-getbuffer
Another key point is that dx11 manages frame buffers automatically. I was trying to get all frame buffers and write to them accordingly. Apperently I only ever need the 0th frame buffer once, create a view to it and only care about that. The rest will be managed automatically behind the scenes by dx11. That is not the case for dx12.
Related question: Does the backbuffer that a RenderTargetView points to automagically change after Present?

Why are there time bubbles in my GPU timeline even when triple buffering?

I'm having trouble understanding why there are time bubbles on my GPU timeline when inspecting my app using PIX timing captures. Here is a picture of one of the time bubbles I'm talking about, highlighted in orange:
The timeline actually doesn't look at all how I expected. Since I am triple buffering, I would expect the GPU to be constantly working, without any time gaps between frames because the CPU is easily able to feed commands to the GPU before the GPU is done processing them. Instead, it doesn't seem like the CPU is 3 frames ahead. It seems like the CPU is constantly waiting for the GPU to be finished before it starts working on a new frame. So it makes me wonder if my triple buffering code is possibly broken? Here is my code for moving to the next frame:
void gpu_interface::next_frame()
{
UINT64 current_frame_fence_value = get_frame_resource()->fence_value;
UINT64 next_frame_fence_value = current_frame_fence_value + 1;
check_hr(swapchain->Present(0, 0));
check_hr(graphics_cmd_queue->Signal(fence.Get(), current_frame_fence_value));
{
// CPU and GPU frame-to-frame event.
PIXEndEvent(graphics_cmd_queue.Get());
PIXBeginEvent(graphics_cmd_queue.Get(), 0, "fence value: %d", next_frame_fence_value);
}
// Check if the next frame is ready to be rendered.
// The GPU must have reached at least up to the fence value of the frame we're about to render.
if (fence->GetCompletedValue() < current_frame_fence_value)
{
PIXBeginEvent(0, "CPU Waiting for GPU to reach fence value: %d", current_frame_fence_value);
// Wait for the next frame resource to be ready
fence->SetEventOnCompletion(current_frame_fence_value, fence_event);
WaitForSingleObject(fence_event, INFINITE);
PIXEndEvent();
}
// Next frame is ready to be rendered
// Update the frame_index. GetCurrentBackBufferIndex() gets incremented after swapchain->Present() calls.
frame_index = swapchain->GetCurrentBackBufferIndex();
frames[frame_index].fence_value = next_frame_fence_value;
}
Here's the whole timing capture: https://1drv.ms/u/s!AiGFMy6hVmtNgaky52n7QDrQ6o7V1A?e=MFc4xW
EDIT: Fixed answer
void gpu_interface::next_frame()
{
check_hr(swapchain->Present(0, 0));
UINT64 current_frame_fence_value = get_frame_resource()->fence_value;
UINT64 next_frame_fence_value = current_frame_fence_value + 1;
check_hr(graphics_cmd_queue->Signal(fence.Get(), current_frame_fence_value));
//// Update the frame_index. GetCurrentBackBufferIndex() gets incremented after swapchain->Present() calls.
frame_index = swapchain->GetCurrentBackBufferIndex();
// The GPU must have reached at least up to the fence value of the frame we're about to render.
size_t minimum_fence = get_frame_resource()->fence_value;
size_t completed = fence->GetCompletedValue();
if (completed < minimum_fence)
{
PIXBeginEvent(0, "CPU Waiting for GPU to reach fence value: %d", minimum_fence);
// Wait for the next frame resource to be ready
fence->SetEventOnCompletion(minimum_fence, fence_event);
WaitForSingleObject(fence_event, INFINITE);
PIXEndEvent();
}
frames[frame_index].fence_value = next_frame_fence_value;
{
// CPU and GPU frame-to-frame event.
PIXEndEvent(graphics_cmd_queue.Get());
PIXBeginEvent(graphics_cmd_queue.Get(), 0, "fence value: %d", next_frame_fence_value);
}
}
Timing capture of the correct code: https://1drv.ms/u/s!AiGFMy6hVmtNgakzGizTiA_s-FwPqA?e=qIHHTw
You signal the queue with current_frame_fence_value and right after you check
if (fence->GetCompletedValue() < current_frame_fence_value)
if the fence completed that value. You need to check the fence value for the next frame to see if you can continue and that is fence_values[frame_index] where frame_index is updated. It would go something like this:
void gpu_interface::next_frame()
{
check_hr(swapchain->Present(0, 0));
UINT64 current_frame_fence_value = get_frame_resource()->fence_value;
check_hr(graphics_cmd_queue->Signal(fence.Get(), current_frame_fence_value));
UINT64 next_frame_fence_value = current_frame_fence_value + 1;
frame_index = swapchain->GetCurrentBackBufferIndex();
// The GPU must have reached at least up to the fence value of the frame we're about to render.
//current_frame_fence_value is not the fence value of the frame you are about the render, it is fence_values[frame_index]
//note that frame_index is updated before this call
if (fence->GetCompletedValue() < fence_values[frame_index])
{
// Wait for the next frame resource to be ready
fence->SetEventOnCompletion(fence_values[frame_index], fence_event);
WaitForSingleObject(fence_event, INFINITE);
}
frames[frame_index].fence_value = next_frame_fence_value;
}
Try writing down fence values for the first few frames to see how that works.

Writing a program to count process memory pages in xv6

I'm trying to write a system call that returns the number of memory pages the current process is using but I have no idea where to start and which variables I should look at.
I saw two variables sz and pgdir in proc.h. But I don't know what each of them represents exactly.
Looking at proc.c, you have all you want to understand the memory management:
// Grow current process's memory by n bytes.
// Return 0 on success, -1 on failure.
int
growproc(int n)
{
uint sz;
struct proc *curproc = myproc();
sz = curproc->sz;
if((sz = allocuvm(curproc->pgdir, sz, sz + n)) == 0)
return -1;
curproc->sz = sz;
switchuvm(curproc);
return 0;
}
growproc is used to increase the process memory by n bytes. This function is used by the sbrk syscall, itself called by malloc.
From this, we assert that sz from struct proc { is actually the process memory size.
Reading allocuvm from vm.c, you can see two macros:
PGROUNDUP(size) which transform a memory size to a memory size that is rounded to next page size,
PGSIZE which is the page size.
So, the number of pages actually used by a process is (PGROUNDUP(proc)->sz)/PGSIZE.

How do I increase the size of EZAudio EZMicrophone?

I would like to use the EZAudio framework to do realtime microphone signal FFT processing, along with some other processing in order to determine the peak frequency.
The problem is, the EZmicrophone class only appears to work on 512 samples, however, my signal requires an FFT of 8192 or even 16384 samples. There doesnt appear to be a way to change the buffer size in EZMicrophone, but I've read posts that recommend creating an array of my target size and appending the microphone buffer to it, then when it's full, do the FFT.
When I do this though, I get large chunks of memory with no data, or discontinuities between the segments of copied memory. I think it may have something to do with the timing or order in which the microphone delegate is being called or memory being overwritten in different threads...I'm grasping at straws here. Am I correct in assuming that this code is being executed everytime the microphone buffer is full of a new 512 samples?
Can anyone suggest what I may be doing wrong? I've been stuck on this for a long time.
Here is the post I've been using as a reference:
EZAudio: How do you separate the buffersize from the FFT window size(desire higher frequency bin resolution).
// Global variables which are bad but I'm just trying to make things work
float tempBuf[512];
float fftBuf[8192];
int samplesRemaining = 8192;
int samplestoCopy = 512;
int FFTLEN = 8192;
int fftBufIndex = 0;
#pragma mark - EZMicrophoneDelegate
-(void) microphone:(EZMicrophone *)microphone
hasAudioReceived:(float **)buffer
withBufferSize:(UInt32)bufferSize
withNumberOfChannels:(UInt32)numberOfChannels {
// Copy the microphone buffer so it wont be changed
memcpy(tempBuf, buffer[0], bufferSize);
dispatch_async(dispatch_get_main_queue(),^{
// Setup the FFT if it's not already setup
if( !_isFFTSetup ){
[self createFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
_isFFTSetup = YES;
}
int samplesRemaining = FFTLEN;
memcpy(fftBuf+fftBufIndex, tempBuf, samplestoCopy*sizeof(float));
fftBufIndex += samplestoCopy;
samplesRemaining -= samplestoCopy;
if (fftBufIndex == FFTLEN)
{
fftBufIndex = 0;
samplesRemaining = FFTLEN;
[self updateFFTWithBufferSize:FFTLEN withAudioData:fftBuf];
}
});
}
You likely have threading issues because you are trying to do work in some blocks that takes much much longer than the time between audio callbacks. Your code is being called repeatedly before prior calls can say that they are done (with the FFT setup or clearing the FFT buffer).
Try doing the FFT setup outside the callback before starting the recording, only copy to a circular buffer or FIFO inside the callback, and do the FFT in code async to the callback (not locked in the same block as the circular buffer copy).

Resources