iOS AudioQueue stuttering - ios

I am building a streaming system for audio playback, and every so often, the audio either glitches or starts stuttering for a second or two.
I am running a single output audioQueue with 3 allocated buffers of 1024 samples with a sample rate of 22050.
I hold a separate list of buffers ready to stream, and that buffer is never empty (logs always show at least one filled buffer there whenever a playback_callback is called). playback_callback just memcpy-s a ready buffer into one of the three AudioBuffers with no locks or other weirdness.
playback_callback takes at most 0.9ms to run (measured via mach_absolute_time), which is far below the 1.0/(22050/1024) = 46.4 ms.
I initiate the queue with either CFRunLoopGetMain () or NULL (which should use an "internal thread") and get the same behavior in both cases.
If buffer size is turned absurdly high (16384 instead of 1024), glitches go away. If the nuber of AudioBuffers is turned from 3 to 8, it practically goes away (happens ~20x more rarely). However, neither of those settings is workable for me as it is not ok for the system to take a second to react to a stream switch (0.1 - 0.2s would still be tolerable)
Any help and ideas on the matter would be greatly appreciated.

Related

Why is Print Screen versus what is actually displaying on the monitor are different?

I'm working on an application that screen captures a monitor in real-time, encodes it, sends it over ethernet, decodes it, then displays that monitor in an application.
So I put the decoder application on the same monitor that is being captured. I then open a timer application and put it next to the decoder application. I can then start the timer and see the latency between main instance of the timer and the timer within the application.
What's weird is that if I take a picture of the monitor with a camera, I get one latency measurement (almost always ~100ms) but if I take a Print Screen of the monitor, the latency between the two is much lower (~30-60ms).
Why is that? How does Print Screen work? Why would it result in 40+ ms difference? Which latency measurement should I trust?
Print Screen saves the screenshot to your clipboard which is stored on your RAM (highest speed storage system in your computer), whereas what you are doing probably writes the screenshot data to your HDD/SSD and then reads it again to send over the internet, which takes a lot longer to do.

Screen tearing and camera capture with Metal

To avoid writing to a constant buffer from both the gpu and cpu at the same time, Apple recommends using a triple-buffered system with the help of a semaphore to prevent the cpu getting too far ahead of the gpu (this is fine and covered in at least three Metal videos now at this stage).
However, when the constant resource is an MTLTexture and the AVCaptureVideoDataOutput delegate runs separately than the rendering loop (CADisplaylink), how can a similar triple-buffered system (as used in Apple’s sample code MetalVideoCapture) guarantee synchronization? Screen tearing (texture tearing) can be observed if you take the MetalVideoCapture code and simply render a full screen quad and change the preset to AVCaptureSessionPresetHigh (at the moment the tearing is obscured by the rotating quad and low quality preset).
I realize that the rendering loop and the captureOutput delegate method (in this case) are both on the main thread and that the semaphore (in the rendering loop) keeps the _constantDataBufferIndex integer in check (which indexes into the MTLTexture for creation and encoding), but screen tearing can still be observed, which is puzzling to me (it would make sense if the gpu writing of the texture is not the next frame after encoding but 2 or 3 frames after, but I don’t believe this to be the case). Also, just a minor point: shouldn’t the rendering loop and the captureOutput have the same frame rate for a buffered texture system so old frames aren’t rendered interleaved with recent ones.
Any thoughts or clarification on this matter would be greatly appreciated; there is another example from McZonk, which doesn’t use the triple-buffered system, but I also observed tearing with this approach (but less so). Obviously, no tearing is observed if I use waitUntilCompleted (equivalent to Open GL’s glfinish), but thats like playing an accordion with one arm tied behind your back!

AudioQueue's Peak Power Progression is too Gradual

I'm using the standard Apple AudioQueue Services from the AudioToolBox library to use my device's microphone. I am keeping track of the peak audio level in real time, and this is working well except it does not respond as expected to uneven changes in volume. The mPeakPower jumps appropriately high for sudden increases in volume (no problem there), but it decreases only gradually: if I feed my device a loud clap followed by silence it jumps up high, then immediately starts to steadily decrease until it reaches the actual current volume (this decrease can take up to 2 seconds). Ideally, a loud sound followed by silence should cause a spike which would immediately fall back to the ambient volume. I need to quickly respond to both increases and decreases in volume. Any insights? I assume this is happening because of a smoothing algorithm, but how can I get around it? Should I be using a different library?
Here's a snippet of my code in the AudioQueueInputCallback method.
AudioQueueLevelMeterState meters[2];
UInt32 dlen = sizeof(meters);
AudioQueueGetProperty(myQueue,kAudioQueueProperty_CurrentLevelMeterDB,meters,&dlen);
NSNumber *ambientVolume = [NSNumber numberWithFloat:meters[0].mPeakPower];
Thanks!

How to synchronize the filling of a ring buffer in a background thread with a remoteio callback

Im looking for an implemtation that uses a ring buffer in remoteio to output a very large audio file.
I have come across the CARingBuffer from apple but I've had a nightmare trying to implement it in my ios project.
As an alternative I came across this ring buffer that I've using (unsuccessfully).
Ring Buffer
How I tried to implement this is as follows.
Open an audio file which is perfectly cut using extaudiofileref.
Fully fill my ring buffer reading from the file (number of frame % inTimeSamples = readpoint)
In my callback if the ring buffer is less than 50% full I call performselector in background to add more samples.
If there is enough samples I just read from the buffer.
This all seems to work fine until I come close to the end of the file and want to loop it. When the reapoint + the number of samples needed to fill the ring buffer exceeds the total number of frames I extract some audio from the remainder of the file, seek to frame 0, then read the rest.
This always sounds glitchy. I think it may have something to do with the fact that the remoteio callback is running much quicker than the background thread so by the time the background thread has completed not only has the calculated readpoint changed but the head and tail of the buffer are not what they should be.
If example code would be too immense to post I would accept pseudo code as an answer. My methodology to solve this is lacking.
This may not be the answer you're looking for, but SFBAudioEngine compiles and runs on iOS and will handle this use case easily. It's basically a higher-level abstraction for the RemoteIO AU and supports many more formats than Core Audio does natively.

How can I ensure the correct frame rate when recording an animation using DirectShow?

I am attempting to record an animation (computer graphics, not video) to a WMV file using DirectShow. The setup is:
A Push Source that uses an in-memory bitmap holding the animation frame. Each time FillBuffer() is called, the bitmap's data is copied over into the sample, and the sample is timestamped with a start time (frame number * frame length) and duration (frame length). The frame rate is set to 10 frames per second in the filter.
An ASF Writer filter. I have a custom profile file that sets the video to 10 frames per second. Its a video-only filter, so there's no audio.
The pins connect, and when the graph is run, a wmv file is created. But...
The problem is it appears DirectShow is pushing data from the Push Source at a rate greater than 10 FPS. So the resultant wmv, while playable and containing the correct animation (as well as reporting the correct FPS), plays the animation back several times too slowly because too many frames were added to the video during recording. That is, a 10 second video at 10 FPS should only have 100 frames, but about 500 are being stuffed into the video, resulting in the video being 50 seconds long.
My initial attempt at a solution was just to slow down the FillBuffer() call by adding a sleep() for 1/10th second. And that indeed does more or less work. But it seems hackish, and I question whether that would work well at higher FPS.
So I'm wondering if there's a better way to do this. Actually, I'm assuming there's a better way and I'm just missing it. Or do I just need to smarten up the manner in which FillBuffer() in the Push Source is delayed and use a better timing mechanism?
Any suggestions would be greatly appreciated!
I do this with threads. The main thread is adding bitmaps to a list and the recorder thread takes bitmaps from that list.
Main thread
Animate your graphics at time T and render bitmap
Add bitmap to renderlist. If list is full (say more than 8 frames) wait. This is so you won't use too much memory.
Advance T with deltatime corresponding to desired framerate
Render thread
When a frame is requested, pick and remove a bitmap from the renderlist. If list is empty wait.
You need a threadsafe structure such as TThreadList to hold the bitmaps. It's a bit tricky to get right but your current approach is guaranteed to give to timing problems.
I am doing just the right thing for my recorder application (www.videophill.com) for purposes of testing the whole thing.
I am using Sleep() method to delay the frames, but am taking great care to ensure that timestamps of the frames are correct. Also, when Sleep()ing from frame to frame, please try to use 'absolute' time differences, because Sleep(100) will sleep about 100ms, not exactly 100ms.
If it won't work for you, you can always go for IReferenceClock, but I think that's overkill here.
So:
DateTime start=DateTime.Now;
int frameCounter=0;
while (wePush)
{
FillDataBuffer(...);
frameCounter++;
DateTime nextFrameTime=start.AddMilliseconds(frameCounter*100);
int delay=(nextFrameTime-DateTime.Now).TotalMilliseconds;
Sleep(delay);
}
EDIT:
Keep in mind: IWMWritter is time insensitive as long as you feed it with SAMPLES that are properly time-stamped.

Resources