Why does adding a renderer to my DirectShow Filter Graph smooth out audio input to the graph? - delphi

I have a DirectShow filter graph in my Delphi 6 application built with the DSPACK component library. The structure of the graph is as follows:
Custom push source audio filter
Sample Grabber
Tee Filter (but only when I turn on both the WAV File Writer and Renderer)
Renderer (preferred PC output device)
WAV File Writer
The Tee Filter is added to the graph only if I have both the Renderer and the WAV File Writer filters turned on. Otherwise I connect only the filter that is turned on directly to the Sample Grabber.
The audio is being delivered over a WiFi connected RTSP audio server that is streaming audio in real-time. If I don't turn on the Wav File Writer, the audio coming out my headphones has the typical pumping and occasional clicking sounds associated with an unbuffered audio stream. Strangely enough, as soon as I turn on the WAV File Writer filter the audio becomes smooth as glass.
I have the source code for the WAV File Writer and it basically handles the tasks of outputting the proper WAV file header when needed and writing the audio buffers as necessary, not much more than that. So I find it strange that turning it on smooths the incoming audio stream, especially since it is not upstream of the Renderer (filter) but instead is a peer filter hanging off the end of the Tee Filter alongside the Renderer.
Can anyone tell me what might be happening to make the audio delivery smooth out when when I turn on the File Writer filter? Does the Tee Filter do any inherent buffering? I want to duplicate the same mechanism so I can have smooth audio when the File Writer is not turned on. I'm trying to avoid adding my own buffering because I don't want to add any more delay to the real time audio stream than I have to.

If you have a live source and you can listen to it and the delivered audio at the same time, you may be able to tell whether adding File Writer introduces a delay, that may be accountable for the difference. Or there may be a change in size or the number of negotiated buffers in DecideBufferSize.
I would suggest introducing explicit buffering in your push filter, like adding an offset to media sample time-stamps. Inherent buffering in Tee filter may be not reliable. Variations in delivery time are inevitable.
A more sophisticated approach, if you need to run with minimal or no buffering, could be to stretch/compress the audio while preserving the pitch.

Related

Synchronising AVAudioEngine audio recording with backing track, using AirPods

I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.

Real time audio recording in Swift

I am building an application which needs to do real time audio recording. I am using Swift for the project - so unable to use Novocaine library (as it has some Obj-C++ code).
What I need is get small chunks of the audio recording (real-time) which I can process or send to my websocket. Is there a Swift library that I can use to achieve this?
In addition to getting the live audio from the microphone, I also need to show a real time waveform.
Start recording
Get an event every for few bytes of recorded data, where I can send these bytes to my websocket.
Showing a waveform for the audio.
Let me know.
You do not need any of 3-rd party tools for getting audio from mic. It can be set up easily using AVAudioEngine. However, for minimising network traffic I suggest to use lame for compressing raw PCM audio stream into mp3.
Here you can find project with minimal functionality for getting mic input and compressing into mp3. In this example project mp3 stores into Documents folder, so you can try and listen to make sure it works.
From this point you can take mp3 buffer and send via socket. You can also play with lame settings to change quality, etc.
There is another branch called no-lame where same functionality implemented without lame encoding. Look here

Read audio file, perform filters (i.e. reverb), and then write audio file without playback on iOS

I'm working on an app which has a requirement for running some basic audio filters (such as normalisation and reverb) on a file. The idea is to take an existing audio file, add the filters, and then write the data to a new file. Crucially, this must be done without any playback and should be fast (i.e. on a 60 second audio file I should be able to add reverb in under a second).
I've looked at several solutions such as The Amazing Audio Engine and AudioBox but these all seem to rely on you playing back any audio in realtime rather than writing it to a file.
Does anybody have examples, or can point me in the right direction, for simply taking a file and applying a basic audio filter without listening to it. I'm sure I must be missing something simple somewhere but my searches have turned up nothing.
In general,the steps are:
Set up an AUGraph like this - AudioFilePlayer -> Reverb/Limiter/EQ/etc. -> GenericOutput
Open the input file and schedule it on the AudioFilePlayer.
Create an output file and repeatedly call AudioUnitRender on the GenericOutput unit, writing the rendered buffers to the output file.
I'm not sure about the speed of this, but it should be acceptable.
There is a comprehensive example of offline rendering in this thread that covers the setup and rendering process.

Audio format to choose for Big audio files

Which audio file format is best to use for large audio files? I have many large audio files to be used in my app but their current mp3 size is of hundred of MB's
If you want to save more storage on audio files, file format may not change too much on the file size, reducing the bit rate(for example 320Kbps to 128Kbps) can reduce the file size significantly.
:how to do it using microsofts audio compression manager?(practically its not well documented in m.s.d.n.
Windows provide codecs that compress specifically audio files. The audio files tipically are PCM format (WAVE_FORMAT_PCM) and get played by using the simplest directsound method (check msdn it`s at hand and it works)
To play a file using directsound, thus PCM format you first create a directsound object, create a directsoundbuffer, and then pump the PCM data directly to the buffer using a keep-fill-buffer algorithm.
If you wish to use codecs, u try and write a procedure that opens a stream file and passes it through a acm driver object, thus (de)compressing it.
The driver for ACM (audio compression manager) finds a codecs that suits the input source and decompresses it yet again to WAVE_FORMAT_PCM for your app be able to play it.

is it possible for avassetwriter to output to memory

I would like to write an iphone app that continuously capture video, h.264 encode them in 10 seconds interval and upload to a storage server. This can be done with avassetwriter, and I can keep on deleting the old files as I create new ones. However, as flash memory have a limited write cycles, this scheme will destroy the flash after a few thousand write cycles through the flash. Is there a way to redirect avassetwriter to memory, or create a ram drive on the iphone?
Thanks!
Yes avassetwriter is the only way to get to the hardware decoder. and simply reading back the file while its written doesn't give you the moov atoms so avfoundation or mpmediaplayer based players won't be able to read it back. you only have a couple choices , periodically stop the asassetwriter and write to the file on a background thread, effectively segmenting your movie into smaller complete files. or you could deal with the incomplete mp4 on the server side, you will have to decode the raw nalu's and recreate the missing moov atoms. If your using ffmpeg mov.c is source to look at. This is also were an incomplete mp4 file would fail.

Resources