I have managed recording user voice following the approach mentioned under below link. Recording plays back fine except the starting words that user say. Like if i say "Hello" it only records as "llo" which is because i measure power level to a threshold value where to start recording. Could anyone guide me how can i recover those initial letters to make it smooth ?
Making of Talking App
Thanks!
Related
I work an an software which processes short speech recordings. Usually no more than a sentence or two. These recordings are sent to my software as WAV files. Sometimes due to a user error these recordings are cut mid-sentence. (A user starts/stops recording using a button and sometimes mistimes it). E.G. a user was saying "I want to buy 3 apples", but due to a mistimes recording button release it comes out as "I want to buy 3 ap".
I wonder if there is an algorithm ML or not that would allow me to filter out such cases? I accept that it won't be 100%, but I want to reduce number of false positives as much as possible.
May be something around an abrupt end of voice rather than slower dying down of voice at the end of a sentence?
Ok guys i'm literally having a breakdown right now. I just don't listen back my voice or guitar while I'm recording on both logic & GarageBand. I simplified the setting by removing my audio interface & I cant even hear my voice through the built-in microphone.
I have :
Macbook-system preferences-outpout:internal speakers & input:internal microphone
Garageband-preferences-audio/midi-output device:built-in input & input device: built-in microphone
Monitoring button at the track I want is on. I had some great ideas I just wanted to put down quickly.
HELP SERIOUSLY.
Oh not sure if it matters but i'm currently creating a TimeMachine backup on a hard drive.
Since Garageband is a commercial application for the recording of audio I'm sure that they would be able to help you get this sorted out. See for example the help online for "GarageBand for Mac: Turn on input monitoring for audio tracks" Recording using one input into two applications might be a tall order.
If you are simply trying to test whether your audio system can record and output at the same time you can use some audio applications. I can't help you with Mac, but on a Linux machine we have available for example the command
arecord -f dat | aplay
that will play back immediately all mike input.
For a project I need to handle audio in an iPhone app quite special and hope somebody may point me in the right direction.
Lets say you have a fixed set of up to thirty audio files of the same length (2-3 sec, non-compressed). While a que is playing from one audio file it should be able to update parameters that makes the playing continue from another audio file from the same timestamp the previous audiofile ended playing. If the different audio files is different versions of heavely filtered audio it should be possible to "slide" between them an get the impression that you applied the filter directly. The filtering is at the moment not possible to achive in realtime on an iPhone, therefore the prerendered files.
If A B and C is different audio files I like to be able to:
Play A without interruption:
Start AAAAAAAAAAAAA Stop
Or start play A and continue over in B and then C, initiated while playing
Start AAABBBBBBBBCC Stop
Ideally is should be possible to play two er more ques at the same time. Latency is not that important, but the skipping between files should ideally not produce clicks or delays.
I have looked into using Audio Queue Services (which look like hell to dive into) and sniffed on OpenAl. Could anyone give me a ruff overview and a general direction I can spend the next days burried into?
Try using the iOS Audio Unit API, particularly a mixer unit connected to RemoteIO for audio output.
I managed to do this by using FMOD Designer. FMOD (http://www.fmod.org/) is a sound design framework for game development, that supports iOS development. I made a multitrack-event in FMOD Designer with different layers for each sound clip. Add a parameter in the horizontal bar that lets you controll which sound clip to play in realtime. The trick is to let each soundclip continue over the whole bar and controll which sound that is beeing heard by using a volume effect (0-100%) like in the attached picture. In that way you are ensured that skipping between files follow the same timecode. I have tried this successfully with up to thirty layers, but experienced some double playing. This seemed to dissapear if I cut the number down to fifteen.
It should be possible to use iOS Audio Unit API if you are comfortable with this, but for those of us that like the most simple sollution FMOD is quite good :) Thanks to Ellen S for the sollution tip!
Screenshot of the multitrack-event in FMOD Designer:
https://plus.google.com/photos/106278910734599034045/albums/5723469198734595793?authkey=CNSIkbyYw8PM2wE
I have programed a voice recognition program and I am have problems with the mic hearing me, over the computer playing music. I need software that can filter out the sound leaving the speakers from the sound entering the mic.
Is there software or a component (for Delphi) that would solve my problem?
You need to capture:
computer output
mic. input
Then you need to find two parameters, depending of your mic. location and sound system delay. This two parameter is n-delay and k-amplify.
Stream1[t+n]*k=Stream2[t]
Where t = time. When you find this parameter then your resulting Stream, only speek mic. input will be
Stream2[t]-Stream1[t+n]*k=MusicReductionStream[t]
I think you want to do what noise canceling microphones do. These systems use at least one extra microphone to calculate the difference between "surrounding noise" and the noise that is aimed directly at the microphone (the speech it has to register). I don't think you can reliably obtain the same effect with a software-only solution.
A first step would obviously be to turn music down :-)
Check out the AsioVST library.
100% open source Delphi code
Free
Very complete
Active (support for xe2 / x64 is being added for example)
Under Examples\Plugins\Crosstalk Cancellation\ you'll find the source code for a plugin that probably does what you're looking for.
The magic happens in DAV_DspCrosstalkCancellation.pas.
I think the speex pre-processor has an echo-cancellation feature. You'll need to feed it the audio data you recorded, and the audio you want to cancel, and it'll try to remove it.
The main problem is finding out what audio your computer plays. Not sure if there is a good API for that.
It also has a noise reduction feature, and voice activity detection. You can compile it as a dll, and then write a delphi header.
You need to estimate the impulse response of the speaker and room, etc., which can change with exact speaker and mic positioning and the size and contents of the room, etc., as well as knowing/estimating the system delay.
If the person or the mic are moveable, the impulse response and delay will need to be continually re-estimated.
Once you have estimated the impulse response, you can convolve it with the output signal and try subtract delayed versions of the result from the mic input until you can null silent portions of the speech input. Cross correlation might be useful for estimating the delay.
I'm designing a simple proof of concept for multitrack recorder.
Obvious starting point is to play from file A.caf to headphones while simultaneously recording microphone input into file B.caf
This question -- Record and play audio Simultaneously -- points out that there are three levels at which I can work:
AVFoundation API (AVAudioPlayer + AVAudioRecorder)
Audio Queue API
Audio Unit API (RemoteIO)
What is the best level to work at? Obviously the generic answer is to work at the highest level that gets the job done, which would be AVFoundation.
But I'm taking this job on from someone who gave up due to latency issues (he was getting a 0.3sec delay between the files), so maybe I need to work at a lower level to avoid these issues?
Furthermore, what source code is available to springboard from? I have been looking at SpeakHere sample ( http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html ). if I can't find something simpler I will use this.
But can anyone suggest something simpler/else? I would rather not work with C++ code if I can avoid it.
Is anyone aware of some public code that uses AVFoundation to do this?
EDIT: AVFoundation example here: http://www.iphoneam.com/blog/index.php?title=using-the-iphone-to-record-audio-a-guide&more=1&c=1&tb=1&pb=1
EDIT(2): Much nicer looking one here: http://www.switchonthecode.com/tutorials/create-a-basic-iphone-audio-player-with-av-foundation-framework
EDIT(3): How do I record audio on iPhone with AVAudioRecorder?
To avoid latency issues, you will have to work at a lower level than AVFoundation alright. Check out this sample code from Apple - Auriotouch. It uses Remote I/O.
As suggested by Viraj, here is the answer.
Yes, you can achieve very good results using AVFoundation. Firstly you need to pay attention to the fact that for both the player and the recorder, activating them is a two step process.
First you prime it.
Then you play it.
So, prime everything. Then play everything.
This will get your latency down to about 70ms. I tested by recording a metronome tick, then playing it back through the speakers while holding the iPhone up to the speakers and simultaneously recording.
The second recording had a clear echo, which I found to be ~70ms. I could have analysed the signal in Audacity to get an exact offset.
So in order to line everything up I just performSelector:x withObject: y afterDelay: 70.0/1000.0
There may be hidden snags, for example the delay may differ from device to device. it may even differ depending on device activity. It is even possible the thread could get interrupted/rescheduled in between starting the player and starting the recorder.
But it works, and is a lot tidier than messing around with audio queues / units.
I had this problem and I solved it in my project simply by changing the PreferredHardwareIOBufferDuration parameter of the AudioSession. I think I have just 6ms latency now, that is good enough for my app.
Check this answer that has a good explanation.