Active noise cancellation - delphi

I have programed a voice recognition program and I am have problems with the mic hearing me, over the computer playing music. I need software that can filter out the sound leaving the speakers from the sound entering the mic.
Is there software or a component (for Delphi) that would solve my problem?

You need to capture:
computer output
mic. input
Then you need to find two parameters, depending of your mic. location and sound system delay. This two parameter is n-delay and k-amplify.
Stream1[t+n]*k=Stream2[t]
Where t = time. When you find this parameter then your resulting Stream, only speek mic. input will be
Stream2[t]-Stream1[t+n]*k=MusicReductionStream[t]

I think you want to do what noise canceling microphones do. These systems use at least one extra microphone to calculate the difference between "surrounding noise" and the noise that is aimed directly at the microphone (the speech it has to register). I don't think you can reliably obtain the same effect with a software-only solution.
A first step would obviously be to turn music down :-)

Check out the AsioVST library.
100% open source Delphi code
Free
Very complete
Active (support for xe2 / x64 is being added for example)
Under Examples\Plugins\Crosstalk Cancellation\ you'll find the source code for a plugin that probably does what you're looking for.
The magic happens in DAV_DspCrosstalkCancellation.pas.

I think the speex pre-processor has an echo-cancellation feature. You'll need to feed it the audio data you recorded, and the audio you want to cancel, and it'll try to remove it.
The main problem is finding out what audio your computer plays. Not sure if there is a good API for that.
It also has a noise reduction feature, and voice activity detection. You can compile it as a dll, and then write a delphi header.

You need to estimate the impulse response of the speaker and room, etc., which can change with exact speaker and mic positioning and the size and contents of the room, etc., as well as knowing/estimating the system delay.
If the person or the mic are moveable, the impulse response and delay will need to be continually re-estimated.
Once you have estimated the impulse response, you can convolve it with the output signal and try subtract delayed versions of the result from the mic input until you can null silent portions of the speech input. Cross correlation might be useful for estimating the delay.

Related

Synchronising AVAudioEngine audio recording with backing track, using AirPods

I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.

AudioUnit recording glitches every 30 seconds

I've used this sample code to create an audio recorder. http://www.stefanpopp.de/capture-iphone-microphone/
I'm finding I get glitches about every 30 seconds. They sound a bit like buffer glitches to me, although I might be wrong. I've tried contacting the author of the article but not having much success. I'm really struggling to follow some of this code. I think it's missing a circular buffer but I'm not sure how important that is here. I'm hoping someone can point me in the right direction to either:
Point me to some different example code or suggest what I need to add to this (high level suggestion is fine - I'm happy to research and do the work, I'm just not confident what the work is)
Suggest some better values to use for things like the buffer data size.
Tell me that there's nothing wrong with this code and my bug is almost certainly elsewhere.
Suggest a library I can use that should take care of it (Amazing Audio Engine 2 looks good for me but I'm a bit worried about the note saying it's retired. AudioKit looks great too but it's missing a peak power reading, which would be a shame to have to implement myself after having imported such a complex library)
Why aren't I using AVAudioSession? I need the user to be able to set mic level while recording and to be able to listen back at the same time. Previously I did this with AVAudioSession but on more recent devices isInputGainSettable returns NO. It also returns NO for many hardware mics plugged in via lightning cable, which we're seeing more and more now the headphone jack is gone.
Several problems.
Apple recommends that object methods not be called in the audio context (the callbacks). Your code has several. Use C functions instead.
Newer iOS devices likely use a hardware sample rate of 48000, not 44100. Resampling potentially causes buffers to change sizes.
The code seems to assume that the play callback buffer was the same size as the input callback buffer. This is not guaranteed. Thus the playback might end up with too few samples, causing periodic glitches.
In my experience (iPhone 6) sample rate from microphone can be 48000 when a headset is not plugged in, and change to 44100 when a headset is plugged in.
If your audiounit is expecting a samplerate of 44100 then glitches like these are to be expected. To verify, you could try if your problem remains when you plug in a headset.
A workaround for the glitch problem seems to be to use an AVAudioEngine. Connect its inputNode to its mainMixerNode using the inputFormat of the inputNode. Connect the mainMixerNode to your AudioUnit in your desired format. Connect your AudioUnit to outputNode of the AVAudioEngine.
Using this mixerNode between inputNode and audioUnit is essential in this workaround.

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

Match a sound in recorded audio stream

I have a PCM stream incoming from the microphone. I am analyzing short chunks (Java language) of it to detect short spikes in sound loudness (amplitude). I have a determined sound that plays periodically and I need to know if detected spike is in fact this sound recorded. I have the PCM for sound played, it's completely determined.
I have no clue where to start, should I perform some comparison in time domain or frequency domain? Would be great if someone could give me some insight on how this is done and where should I dig.
Thanks.
It sounds like you want to compare an incoming set of pulses to a references set of pulses. Cross-correlation is probably what you want to use. You may need to precondition your data first, eg create an envelope instead of using raw data, or the cross-correlation may fail unless the match is perfect.

Directsound stream synchronisation

I have a question regarding the synchronization of 2 Directsound streams.
To record and play sound I currently use Portaudio to open 2 Directsound streams.
There are 2 callback functions which are called every time the input buffer is filled and the output buffer needs data.
Now here`s my problem...
The input stream is running at 48kHz samplerate (#1024 samples). The output stream is running at 192kHz samplerate (#4096 samples). Every time the input buffer is filled and the callback is called I do some DSP and after that I convert the result to 192kHz. The output stream takes the result and outputs the data. Now the 2 streams are running completely out of sync.
I have looked through the entire Portaudio API but I cant`t find a sync option to lock the 2 streams together.
Is there any way to lock 2 Directsound streams? I really need 48kHz input and 192kHz output.
Br,
Vincent Bruinink.
The thing is that you can't really open two streams "at the same time", nor can you open two devices (or even one device at two different sample rates) and expect them to stay truly in sync, even if they were, at one time, in sync. To understand why, you may want to read something about how audio works on a computer. You may also want to read this document, which is specific to PortAudio.
As an alternative, you may want to consider opening a single device in a single stream and using software sample-rate conversion.

Resources