I have a stereo audio file for which I try to separate the audio channels. Resulting in a bus with the left audio, and a bus with the right audio signal. On these channels I want to do some operations and then merge them again to a single stereo signal.
Reading the audio file, doing operations on the bus and merging it to a single signal is no problem (though I haven't tested if this signal is stereo, as it depends on the prior channels I guess).
My problem is in separating the left and the right channel, so I can independently modify them.
One of my ideas was to use the panproperty of AVAudioPlayerNode to have the signal only left/right, but it seems like (as mentioned in the documentation) this property is not yet implemented in the AVAudioPlayerNode (Even though in all the examples of the WWDC videos it is used).
Another solution I found was this. Using memcpy to create new buffers. I haven't tried this yet as I guess this takes quite some time and is not suitable for a normal player.
Third there is a framework called audiokit. This provides the option of converting the stream to mono left/right channel and then merging the signal again by creating an AKStereoOperation. My problem with this solution is, it is quite a simple use-case. Separating the audio channels. For this I find it hard to justify including such a huge framework, even though it would probably work (not tested).
Is there a simple way to separate the channels?
Thanks!
You can take the raw samples and create an AudioConverter using AudioConverterNew with and output AudioStreamBasicDescription configured to de-interleave the stereo channels. This will result in one buffer with one channel contiguously first and the second channel in the second half of the buffer.
Related
I am working on a convolutional neural net which takes an audio spectrogram to discriminate between music and speech using the GTZAN dataset
If single samples are shorter, then this gives more samples overall. But if samples are too short, then they may lack important features?
How much data is needed for recognizing if a piece of audio is music or speech?
How long should the audio samples be ideally?
The length of audios vary on number of factors.
The basic idea is to get just enough samples.
Since audio changes constantly, it is preferred to work on a shorter data. However, very small frame would result into less/no feature to be captured.
On the other hand very large sample would capture too many features, thereby leading to complexity.
So, in most usecases, although the ideal audio length is 25seconds, but it is not a written rule and you may manipulate it accordingly.Just make sure the frame size is not very small or very large.
Update for dataset
Check this link for dataset of 30s
How much data is needed for recognizing if a piece of audio is music or speech?
If someone knew the answer to this question exactly then the problem would be solved already :)
But seriously, it depends on what your downstream application will be. Imagine trying to discriminate between speech with background music vs acapella singing (hard) or classifying orchestral music vs audio books (easy).
How long should the audio samples be ideally?
Like everything in machine learning, it depends on the application. For you, I would say test with at least 10, 20, and 30 secs, or something like that. You are correct in that the spectral values can change rather drastically depending on the length!
I am building an iOS app that allows the user to play guitar sounds - e.g. plucking or strumming.
I'd like to allow the user to apply pitch shifting or wah-wah (compression) on the guitar sound being played.
Currently, I am using audio samples of the guitar sound.
I've done some basic read-ups on DSP and audio synthesis, but I'm no expert in it. I saw libraries such as csound and stk, and it appears that the sounds they produced are synthesized (i.e. not played from audio samples). I am not sure how to apply them, or if I can use them to apply effects such as pitch shifting or wah-wah to audio samples.
Can someone point me in the right direction for this?
You can use open-source audio processing libraries. Essentially, you are getting audio samples in and you need to process them and send them as samples out. The processing can be done by these libraries, or you use one of your own. Here's one DSP-Library (Disclaimer: I wrote this). Look at the process(float,float) method for any of the classes to see how one does this.
Wah-wah and compression are 2 completely different effects. Wah-wah is a lowpass filter whose center frequency varies slowly, whereas compression is a method to equalize the volume. The above library has a Compressor class that you can check out.
The STK does have effects classes as well, not just synthesis classes (JCRev) is one for reverb but I would highly recommend staying away from it as they are really hard to compile and maintain.
If you haven't seen this already, check out Julius Smith's excellent, and comprehensive book Physical Audio Signal Processing
I'm looking to build a really simple EQ that plays a filtered version of a song in the user's library. It would essentially be a parametric EQ: I'd specify the bandwidth, cut/boost (in dB), and centre frequency, and then be returned some object that I could play just like my original MPMediaItem.
For MPMediaItems, I've generally used AVAudioPlayer in the past with great success. For audio generation, I've used AudioUnits. In MATLAB, I'd probably just create custom filters to do this. I'm at a bit of a loss for how to approach this in iOS! Any pointers would be terrific. Thanks for reading
iOS ships with a fairly sizeable number of audio units. One of kAudioUnitSubType_ParametricEQ, kAudioUnitSubType_NBandEQ or kAudioUnitSubType_BandPassFilter is probably what you want depending on whether you want to control Q as well as Fc and Gain.
I suspect you will have to forego using higher-level components such as AVAudioPlayer to make use of it.
The relevant iOS audio unit reference can be found here
How do I play multiple audio and change its volume using Novocaine?
thanks!
There is similar question which I wrote quite a lengthy response for
Using Novocaine in an audio app
Basically playing multiple sounds at once involves mixing the various sounds down sample by sample. Changing volume involves multiplying samples in the audio buffer by some amplitude value. That is if you want your output to be twice as loud simply multiply every sample by 2.0f. Accelerate framework can help you with this.
I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.