I would like to use AVAudioFoundation for microphone input for speech detection as shown in this ios example and simultaneously detect the pitch of the user’s voice through the same microphone input, using AudioKit. The latter API is probably a wrapper around the first, but has its own classes and initialization. Is there a way to provide AudioKit with an existing microphone configuration like in the speech example, or some alternative way to use the Speech API and AudioKit’s microphone pitch detection API simultaneously? How might I achieve this?
EDIT: The question is a little more complex
I need to be able to synchronize 3 things: touch events, audio kit detection times, and speech detection times. Each of these operates on a different timebase. Speech gives me segment timestamps with respect to the beginning of audio recording. The timestamp for UITouch events will be different. I am not sure what AudioKit uses for its timestamps. There is some mention of host time and AV timestamps here, but I'm not sure this will get me anywhere.
Speech and audio synchronization is a little unclear. May I have a lead for how this might work?
Related
When a person speaks far away from a mobile, the voice recorded is low.
When a person speaks near a mobile, the voice recorded is high. I want to is to play the human voice in equal volume no matter how far away (not infinite) he is from the phone when the voice is recorded.
What I have already tried:
adjust the volume based on the dB such as AVAudioPlayer But
the problem is that the dB contains all the environmental sound. So
it only works when the human voice vary heavily.
Then I thought I should find a way to sample the intensity of the
human voice in the media which leads me to voice recognition. But
this is a huge topic. I cannot narrow the areas which could
solve my problems.
The voice recorded from distance suffers from significant corruption. One problem is noise, another is echo. To amplify it you need to clean voice from echo and noise. Ideally you need to do that with a better microphone, but if only a single microphone is available you have to apply signal processing. The signal processing algorithms you are interested in are:
Noise cancellation. You can find many samples on Google from simple
to very advanced ones
Echo cancellation. Again you can find many implementations.
There is no ready library to do the above, you will have to implement a large part yourself, you can look on the WebRTC code which has both noise and echo cancellation, like described in this question:
Is it possible to reduce background noise while streaming audio on the iPhone?
I am building an iOS app that allows the user to play guitar sounds - e.g. plucking or strumming.
I'd like to allow the user to apply pitch shifting or wah-wah (compression) on the guitar sound being played.
Currently, I am using audio samples of the guitar sound.
I've done some basic read-ups on DSP and audio synthesis, but I'm no expert in it. I saw libraries such as csound and stk, and it appears that the sounds they produced are synthesized (i.e. not played from audio samples). I am not sure how to apply them, or if I can use them to apply effects such as pitch shifting or wah-wah to audio samples.
Can someone point me in the right direction for this?
You can use open-source audio processing libraries. Essentially, you are getting audio samples in and you need to process them and send them as samples out. The processing can be done by these libraries, or you use one of your own. Here's one DSP-Library (Disclaimer: I wrote this). Look at the process(float,float) method for any of the classes to see how one does this.
Wah-wah and compression are 2 completely different effects. Wah-wah is a lowpass filter whose center frequency varies slowly, whereas compression is a method to equalize the volume. The above library has a Compressor class that you can check out.
The STK does have effects classes as well, not just synthesis classes (JCRev) is one for reverb but I would highly recommend staying away from it as they are really hard to compile and maintain.
If you haven't seen this already, check out Julius Smith's excellent, and comprehensive book Physical Audio Signal Processing
How do I play multiple audio and change its volume using Novocaine?
thanks!
There is similar question which I wrote quite a lengthy response for
Using Novocaine in an audio app
Basically playing multiple sounds at once involves mixing the various sounds down sample by sample. Changing volume involves multiplying samples in the audio buffer by some amplitude value. That is if you want your output to be twice as loud simply multiply every sample by 2.0f. Accelerate framework can help you with this.
Do i need to use FFT? Is there any SDK or smth?
Without FFT you can visualize the "scope" of the audio, the actual waveform of the sound. If you want sonogram or something like gfx equalizer does, you'll need to preprocess the audio with FFT. For this there are some SDK support, it's called Accelerator framework. You might want to check out the Apple's aurioTouch2 example.
How can I detect that speech was started from some audio file. I need only detect start and stop of the speech without recognition
Thank you.
Check out this app
http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html
you can tinker with this sample code a little to get what you need...
Here is one more link that I have come across
http://developer.apple.com/library/ios/#samplecode/aurioTouch/Introduction/Intro.html#//apple_ref/doc/uid/DTS40007770
You could use a pitch detector to listen for the presence of harmonic tones within the range of human speech. I don't know of any pitch detector for iOS though. I wrote my own And it was very hard.
Dirac does pitch detection, I don't know how accurate it is because I don't want to spend £1000 on the licence.