How do I play multiple audio and change its volume using Novocaine?
thanks!
There is similar question which I wrote quite a lengthy response for
Using Novocaine in an audio app
Basically playing multiple sounds at once involves mixing the various sounds down sample by sample. Changing volume involves multiplying samples in the audio buffer by some amplitude value. That is if you want your output to be twice as loud simply multiply every sample by 2.0f. Accelerate framework can help you with this.
Related
I would like to use AVAudioFoundation for microphone input for speech detection as shown in this ios example and simultaneously detect the pitch of the user’s voice through the same microphone input, using AudioKit. The latter API is probably a wrapper around the first, but has its own classes and initialization. Is there a way to provide AudioKit with an existing microphone configuration like in the speech example, or some alternative way to use the Speech API and AudioKit’s microphone pitch detection API simultaneously? How might I achieve this?
EDIT: The question is a little more complex
I need to be able to synchronize 3 things: touch events, audio kit detection times, and speech detection times. Each of these operates on a different timebase. Speech gives me segment timestamps with respect to the beginning of audio recording. The timestamp for UITouch events will be different. I am not sure what AudioKit uses for its timestamps. There is some mention of host time and AV timestamps here, but I'm not sure this will get me anywhere.
Speech and audio synchronization is a little unclear. May I have a lead for how this might work?
I am working on a convolutional neural net which takes an audio spectrogram to discriminate between music and speech using the GTZAN dataset
If single samples are shorter, then this gives more samples overall. But if samples are too short, then they may lack important features?
How much data is needed for recognizing if a piece of audio is music or speech?
How long should the audio samples be ideally?
The length of audios vary on number of factors.
The basic idea is to get just enough samples.
Since audio changes constantly, it is preferred to work on a shorter data. However, very small frame would result into less/no feature to be captured.
On the other hand very large sample would capture too many features, thereby leading to complexity.
So, in most usecases, although the ideal audio length is 25seconds, but it is not a written rule and you may manipulate it accordingly.Just make sure the frame size is not very small or very large.
Update for dataset
Check this link for dataset of 30s
How much data is needed for recognizing if a piece of audio is music or speech?
If someone knew the answer to this question exactly then the problem would be solved already :)
But seriously, it depends on what your downstream application will be. Imagine trying to discriminate between speech with background music vs acapella singing (hard) or classifying orchestral music vs audio books (easy).
How long should the audio samples be ideally?
Like everything in machine learning, it depends on the application. For you, I would say test with at least 10, 20, and 30 secs, or something like that. You are correct in that the spectral values can change rather drastically depending on the length!
I have a stereo audio file for which I try to separate the audio channels. Resulting in a bus with the left audio, and a bus with the right audio signal. On these channels I want to do some operations and then merge them again to a single stereo signal.
Reading the audio file, doing operations on the bus and merging it to a single signal is no problem (though I haven't tested if this signal is stereo, as it depends on the prior channels I guess).
My problem is in separating the left and the right channel, so I can independently modify them.
One of my ideas was to use the panproperty of AVAudioPlayerNode to have the signal only left/right, but it seems like (as mentioned in the documentation) this property is not yet implemented in the AVAudioPlayerNode (Even though in all the examples of the WWDC videos it is used).
Another solution I found was this. Using memcpy to create new buffers. I haven't tried this yet as I guess this takes quite some time and is not suitable for a normal player.
Third there is a framework called audiokit. This provides the option of converting the stream to mono left/right channel and then merging the signal again by creating an AKStereoOperation. My problem with this solution is, it is quite a simple use-case. Separating the audio channels. For this I find it hard to justify including such a huge framework, even though it would probably work (not tested).
Is there a simple way to separate the channels?
Thanks!
You can take the raw samples and create an AudioConverter using AudioConverterNew with and output AudioStreamBasicDescription configured to de-interleave the stereo channels. This will result in one buffer with one channel contiguously first and the second channel in the second half of the buffer.
I am building an iOS app that allows the user to play guitar sounds - e.g. plucking or strumming.
I'd like to allow the user to apply pitch shifting or wah-wah (compression) on the guitar sound being played.
Currently, I am using audio samples of the guitar sound.
I've done some basic read-ups on DSP and audio synthesis, but I'm no expert in it. I saw libraries such as csound and stk, and it appears that the sounds they produced are synthesized (i.e. not played from audio samples). I am not sure how to apply them, or if I can use them to apply effects such as pitch shifting or wah-wah to audio samples.
Can someone point me in the right direction for this?
You can use open-source audio processing libraries. Essentially, you are getting audio samples in and you need to process them and send them as samples out. The processing can be done by these libraries, or you use one of your own. Here's one DSP-Library (Disclaimer: I wrote this). Look at the process(float,float) method for any of the classes to see how one does this.
Wah-wah and compression are 2 completely different effects. Wah-wah is a lowpass filter whose center frequency varies slowly, whereas compression is a method to equalize the volume. The above library has a Compressor class that you can check out.
The STK does have effects classes as well, not just synthesis classes (JCRev) is one for reverb but I would highly recommend staying away from it as they are really hard to compile and maintain.
If you haven't seen this already, check out Julius Smith's excellent, and comprehensive book Physical Audio Signal Processing
I'm looking to build a really simple EQ that plays a filtered version of a song in the user's library. It would essentially be a parametric EQ: I'd specify the bandwidth, cut/boost (in dB), and centre frequency, and then be returned some object that I could play just like my original MPMediaItem.
For MPMediaItems, I've generally used AVAudioPlayer in the past with great success. For audio generation, I've used AudioUnits. In MATLAB, I'd probably just create custom filters to do this. I'm at a bit of a loss for how to approach this in iOS! Any pointers would be terrific. Thanks for reading
iOS ships with a fairly sizeable number of audio units. One of kAudioUnitSubType_ParametricEQ, kAudioUnitSubType_NBandEQ or kAudioUnitSubType_BandPassFilter is probably what you want depending on whether you want to control Q as well as Fc and Gain.
I suspect you will have to forego using higher-level components such as AVAudioPlayer to make use of it.
The relevant iOS audio unit reference can be found here