How can I detect that speech was started from some audio file. I need only detect start and stop of the speech without recognition
Thank you.
Check out this app
http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html
you can tinker with this sample code a little to get what you need...
Here is one more link that I have come across
http://developer.apple.com/library/ios/#samplecode/aurioTouch/Introduction/Intro.html#//apple_ref/doc/uid/DTS40007770
You could use a pitch detector to listen for the presence of harmonic tones within the range of human speech. I don't know of any pitch detector for iOS though. I wrote my own And it was very hard.
Dirac does pitch detection, I don't know how accurate it is because I don't want to spend £1000 on the licence.
Related
I would like to use AVAudioFoundation for microphone input for speech detection as shown in this ios example and simultaneously detect the pitch of the user’s voice through the same microphone input, using AudioKit. The latter API is probably a wrapper around the first, but has its own classes and initialization. Is there a way to provide AudioKit with an existing microphone configuration like in the speech example, or some alternative way to use the Speech API and AudioKit’s microphone pitch detection API simultaneously? How might I achieve this?
EDIT: The question is a little more complex
I need to be able to synchronize 3 things: touch events, audio kit detection times, and speech detection times. Each of these operates on a different timebase. Speech gives me segment timestamps with respect to the beginning of audio recording. The timestamp for UITouch events will be different. I am not sure what AudioKit uses for its timestamps. There is some mention of host time and AV timestamps here, but I'm not sure this will get me anywhere.
Speech and audio synchronization is a little unclear. May I have a lead for how this might work?
When I try google speech recognition it shows low performance on traditional Chineses audio file with background noise. Can I improve the performance of speech recognition after some pre-processing(like speech enhancement)? Does it work on google speech service?
I would suggest that you go through this page in google cloud speech documentation outlining best practices on how to provide speech data to the service, including recommendations for pre-processing.
Keep the recording as close to the original speech signal as possible. No distortion, no clipping, no noise, no artificial pre-processing, like noise suppression and automatic gain control. I think such kind of pre-processings can damage the useful information in speech signals.
I copied the key points from google and paste them as below.
Position the microphone as close as possible to the person that is speaking, particularly when background noise is present.
Avoid audio clipping.
Do not use automatic gain control (AGC).
All noise reduction processing should be disabled.
Listen to some sample audio. It should sound clear, without distortion or unexpected noise.
I am building an iOS app that allows the user to play guitar sounds - e.g. plucking or strumming.
I'd like to allow the user to apply pitch shifting or wah-wah (compression) on the guitar sound being played.
Currently, I am using audio samples of the guitar sound.
I've done some basic read-ups on DSP and audio synthesis, but I'm no expert in it. I saw libraries such as csound and stk, and it appears that the sounds they produced are synthesized (i.e. not played from audio samples). I am not sure how to apply them, or if I can use them to apply effects such as pitch shifting or wah-wah to audio samples.
Can someone point me in the right direction for this?
You can use open-source audio processing libraries. Essentially, you are getting audio samples in and you need to process them and send them as samples out. The processing can be done by these libraries, or you use one of your own. Here's one DSP-Library (Disclaimer: I wrote this). Look at the process(float,float) method for any of the classes to see how one does this.
Wah-wah and compression are 2 completely different effects. Wah-wah is a lowpass filter whose center frequency varies slowly, whereas compression is a method to equalize the volume. The above library has a Compressor class that you can check out.
The STK does have effects classes as well, not just synthesis classes (JCRev) is one for reverb but I would highly recommend staying away from it as they are really hard to compile and maintain.
If you haven't seen this already, check out Julius Smith's excellent, and comprehensive book Physical Audio Signal Processing
I have scoured the net for resources on BPM detection for iOS, tried to implement various techniques and link to various libraries etc. but I just have issues either with build errors or with bpm detection not working.
What are the viable options for basic BPM detection on iOS? It doesn't have to be highly accurate with onset positions, but rather just detect the BPM for a series of audio buffers.
I tried VAMP but cannot get it to run on iOS, Ive tried various c++ options but none of them work.
Are there any MIT licensed BPM detection algorithms that integrate easily with iOS, or any commercial options that don't cost loads because its for a full audio library. I would like to detect BPM from a file not through the microphone.
I would just like a BPM detector class as I don't have the time to learn and implement one myself at this point in time.
Any help will be greatly appreciated.
I need to write a speech detection algorithm (not speech recognition).
At first I thought I just have to measure the microphone power and compare it to some threshold value. But the problem gets much harder once you have to take the ambient sound level into consideration (for example in a pub a simple power threshold is crossed immediately because of other people talking).
So in the second version I thought I have to measure the current power spikes against the average sound level or something like that. Coding this idea proved to be quite hairy for me, at which point I decided it might be time to research already existing solutions.
Do you know of some general algorithm description for speech detection? Existing code or library in C/C++/Objective-C is also fine, be it commercial or free.
P.S. I guess there is a difference between “speech” and “sound” recognition, with the first one only responding to frequencies close to human speech range. I’m fine with the second, simpler case.
The key phrase that you need to Google for is Voice Activity Detection (VAD) – it's implemented widely in telecomms, particularly in Acoustic Echo Cancellation (AEC).