I am using DSPACK with Delphi 6 Pro.
I am looking for a good sample that shows how to create a filter graph that will convert the sample rate of an audio stream to a desired format (sample rate, bit depth, and number of channels) in real time.
Does anyone know of a good example project that shows how to structure the filter graph with DSPACK to do this? If not with DSPACK, then if you know of a good example or web page that discusses the general DirectX filter graph concepts involved, I can use that.
I also know C/C++ and can follow a C# example well enough.
You need a resampling filter to do this. Options include:
implement a filter which does Audio Resampling
using some resample code/library, see Free Resampling Software
wrapping Media Foundation Audio Resampler DSP, if you are OK with its runtime requirements
use third party filter
Having such filter available, you will need to build a transcoding graph with audio source, resampler and target of your conversion (such as file).
Also as far as I remember, that stock ACM Wrapper Filter is capable of converting PCM audio between standard sample rates.
Related
I've been doing research on the best way to do video processing on iOS using the latest technologies and have gotten a few different results. It seems there's ways to do this with Core Image, OpenGL, and some open source frameworks as well. I'd like to steer away from the open source options just so that I can learn what's going on behind the scenes, so the question is:
What is my best option for processing (filters, brightness, contrast, etc.) a pre-recorded video on iOS?
I know Core Image has a lot of great built in filters and has a relatively simple API, but I haven't found any resources on how to actually break down a video into images and then re-encode them. Any help on this topic would be extremely useful, thanks.
As you state, you have several options for this. Whichever you regard as "best" will depend on your specific needs.
Probably your simplest non-open-source route would be to use Core Image. Getting the best performance out of Core Image video filtering will still take a little work, since you'll need to make sure you're doing GPU-side processing for that.
In a benchmark application I have in my GPUImage framework, I have code that uses Core Image in an optimized manner. To do so, I set up AV Foundation video capture and create a CIImage from the pixel buffer. The Core Image context is set to render to an OpenGL ES context, and the properties on that (colorspace, etc.) are set to render quickly. The settings I use there are ones suggested by the Core Image team when I talked to them about this.
Going the raw OpenGL ES route is something I talk about here (and have a linked sample application there), but it does take some setup. It can give you a little more flexibility than Core Image because you can write completely custom shaders to manipulate images in ways that you might not be able to in Core Image. It used to be that this was faster than Core Image, but there's effectively no performance gap nowadays.
However, building your own OpenGL ES video processing pipeline isn't simple, and it involves a bunch of boilerplate code. It's why I wrote this, and I and others have put a lot of time into tuning it for performance and ease of use. If you're concerned about not understanding how this all works, read through the GPUImageVideo class code within that framework. That's what pulls frames from the camera and starts the video processing operation. It's a little more complex than my benchmark application, because it takes in YUV planar frames from the camera and converts those to RGBA in shaders in most cases, instead of grabbing raw RGBA frames. The latter is a little simpler, but there are performance and memory optimizations to be had with the former.
All of the above was talking about live video, but prerecorded video is much the same, only with a different AV Foundation input type. My GPUImageMovie class has code within it to take in prerecorded movies and process individual frames from that. They end up in the same place as frames you would have captured from a camera.
I am building an iOS app that allows the user to play guitar sounds - e.g. plucking or strumming.
I'd like to allow the user to apply pitch shifting or wah-wah (compression) on the guitar sound being played.
Currently, I am using audio samples of the guitar sound.
I've done some basic read-ups on DSP and audio synthesis, but I'm no expert in it. I saw libraries such as csound and stk, and it appears that the sounds they produced are synthesized (i.e. not played from audio samples). I am not sure how to apply them, or if I can use them to apply effects such as pitch shifting or wah-wah to audio samples.
Can someone point me in the right direction for this?
You can use open-source audio processing libraries. Essentially, you are getting audio samples in and you need to process them and send them as samples out. The processing can be done by these libraries, or you use one of your own. Here's one DSP-Library (Disclaimer: I wrote this). Look at the process(float,float) method for any of the classes to see how one does this.
Wah-wah and compression are 2 completely different effects. Wah-wah is a lowpass filter whose center frequency varies slowly, whereas compression is a method to equalize the volume. The above library has a Compressor class that you can check out.
The STK does have effects classes as well, not just synthesis classes (JCRev) is one for reverb but I would highly recommend staying away from it as they are really hard to compile and maintain.
If you haven't seen this already, check out Julius Smith's excellent, and comprehensive book Physical Audio Signal Processing
I'm looking to build a really simple EQ that plays a filtered version of a song in the user's library. It would essentially be a parametric EQ: I'd specify the bandwidth, cut/boost (in dB), and centre frequency, and then be returned some object that I could play just like my original MPMediaItem.
For MPMediaItems, I've generally used AVAudioPlayer in the past with great success. For audio generation, I've used AudioUnits. In MATLAB, I'd probably just create custom filters to do this. I'm at a bit of a loss for how to approach this in iOS! Any pointers would be terrific. Thanks for reading
iOS ships with a fairly sizeable number of audio units. One of kAudioUnitSubType_ParametricEQ, kAudioUnitSubType_NBandEQ or kAudioUnitSubType_BandPassFilter is probably what you want depending on whether you want to control Q as well as Fc and Gain.
I suspect you will have to forego using higher-level components such as AVAudioPlayer to make use of it.
The relevant iOS audio unit reference can be found here
I'm trying to create a lightweight diphone speech synthesizer. Everything seems pretty straightforward because my native language has pretty simple pronunciation and text processing rules. The only problem I've stumbled upon is pitch control.
As far as I understand, to control the pitch of the voice, most speech synthesizers are using LPC (linear predictive coding), which essentially separates the pitch information away from the recorded voice samples, and then during synthesis I can supply my own pitch as needed.
The problem is that I'm not a DSP specialist. I have used a Ooura FFT library to extract AFR information, I know a little bit about using Hann and Hamming windows (have implemented C++ code myself), but mostly I treat DSP algorithms as black boxes.
I hoped to find some open-source library which is just bare LPC code with usage examples, but I couldn't find anything. Most of the available code (like Festival engine) is tightly integrated in to the synth and it would be pretty hard task to separate it and learn how to use it.
Is there any C/C++/C#/Java open source DSP library with a "black box" style LPC algorithm and usage examples, so I can just throw a PCM sample data at it and get the LPC coded output, and then throw the coded data and synthesize the decoded speech data?
it's not exactly what you're looking for, but maybe you get some ideas from this quite sophisticated toolbox: Praat
I am currently in a webcam streaming server project that requires the function of dynamically adjusting the stream's bitrate according to the client's settings (screen sizes, processing power...) or the network bandwidth. The encoder is ffmpeg, since it's free and open sourced, and the codec is MPEG-4 part 2. We use live555 for the server part.
How can I encode MBR MPEG-4 videos using ffmpeg to achieve this?
The multi-bitrate video you are describing is called "Scalable Video Codec". See this wiki link for basic understanding.
Basically, in a scalable video codec, a base layer stream itself has completely decodable; however, additional information is represented in the form of (one or many) enhancement streams. There are couple of techniques to be able to do this including lower/higher resolution, framerate and change in Quantization. The following papers explains in details
of Scalable Video coding for MEPG4 and H.264 respectively. Here is another good paper that explains what you intend to do.
Unfortunately, this is broadly a research topic and till date no open source (ffmpeg and xvid) doesn't support such multi layer encoding. I guess even commercial encoders don't support this as well. This is significantly complex. Probably you can check out if Reference encoder for H.264 supports it.
The alternative (but CPU expensive) way could be transcode in real-time while transmitting the packets. In this case, you should start off with reasonably good quality to start with. If you are using FFMPEG as API, it should not be a problem. Generally multiple resolution could still be a messy but you can keep changing target encoding rate.