Hope this question makes some sense, I'm completely lost....
In my proto-app I'm recording micro input and saving it, and so far no problems at all.
I now need to access the buffer while I'm recording it in order to pass chunks of data to another class (written in C, not by me) that will do some analysis.
I spent the whole day browsing and reading, and looks like I need use Audio Queues in order to access the buffer.
The problem is that the syntax is C, and I don't understand it at all :)
So my questions are:
1) Is there any other way to achieve what I'm looking for? I don't need in-depth explanation, just some hints and I will browse my way through :) I'm asking because I'm not 100% sure that Audio Queues are the only way to go
2) Any good tutorial or example about Audio Queues? The aurioTouch tutorial by Apple wasn't very useful (again, I don't know C). I could bypass my problems in C by following a good tutorial that a noob like me can understand
Thanks a lot, and for any help you could offer.
Good question.
You can use code written by other people like:
Novocaine - pretty straightforward. (but there are some bugs, at least in older version I used ~ 6 months ago. Something with mono and stereo.)
Momu - quite a good thing in C++ (you need to use .mm extension for you files)
Those will save you time if you want some low level audio programming. Some basic skills in C still required though. Check out this guy. His explanations and enthusiasm are excellent.
With all mentioned above you can be ready in a 1-2 days of work carrying away good skills in C.
EDIT
Basically, everywhere you work with low-level audio you deal with a C array of numbers (represented like float *audioBuffer;) called audio samples. You cycle through it in a loop, do some operations, copy it, send somewhere, analyze.
To copy it you have to allocate space for it. Actual byte size of the buffer can be calculated like this: numberOfSamples*sizeof(type).
Related
I understand so far that AKSampler was recently rewritten and this GitHub project seems to be the defacto guide on the new AKSampler. What I can gather is a move toward SFZ format. I am new to the sampling world but in my application I only need a handful of samples recorded from my piano in order for it to work. As I have looked around with existing SFZ formats and samples, I do not need all of the complexity and features that SFZ provides.
I am currently using AKSampler with a single piano sample which works perfectly, however it gets a bit weird once I play anything too far from the original source, so I just want to fill in the gaps with a few other samples (I only need to play around an octave and a half with my current app).
I do see according to the Docs a couple methods buildSimpleKeyMap() and buildKeyMap() however there is no implementation currently.
Do I have any additional options? I know that EXS format has been deprecated, as well as SoundFont. Is the only way to map multiple samples to AKSampler currently using SFZ?
Thanks for all your help <3
Edit: This readme on the AKSampler GitHub page provides the breakdown for samples. I still only see SFZ being considered. If anyone else is lost with my question or needs a reference, this seems to be the best resource. If the current AKSampler only offers SFZ as the primary way to map multiple samples, so be it, however it does look very challenging, I'm really hoping there is some simple middle ground between only using a single sample for the AKSampler vs. a full bore SFZ file.
Edit 2: Getting a solution to this, will update as soon as possible, thanks for your patience!
I have provided a simple explainer and sample file in the AudioKit docs. Hope this helps new users of AudioKit!
I want to make a really simple synth.
In short, i want to play a wav file, and have it loop at certain points until touch is released.
I am looking for some example code, (doesn't need to be free).
Sorry for such a basic question, i have been googling this, though there seems to be nothing on this exact topic, unless I'm missing some important term.
Also, is what i'm describing, a wavetable synth, or a soundboard?
I'd call it a sampler.
Here's a sample project that will get you started:
https://sites.google.com/site/iphonecoreaudiodevelopment/remoteio-playback
See also:
The Audio Programming Book
The Core Audio Book
A sample project of mine
You need to store the sound data in memory, and have some sort of read() command that fills an array of bytes to send to the sound card. The read() command has to keep track of its position between reads, so a persistent pointer must be maintained. You will test the position of the pointer and see if you've reached the end or not, and reset to the beginning when needed.
The specifics are going to depend upon your chosen language, of course.
I did this with Java, with the added the possibility of playback at different speeds.
http://www.hexara.com/VSL/VSL2.htm
It's a little laggy. I've learned a bit since posting that, but haven't gone back to fix it yet. The program asks permission and has you load a wav file from your computer. It should be 16-bit, stereo, 44100fps, little-endian.
WaveTable synthesis is a bit different, in that only a single iteration of a wave is stored and used as source data.
Here is a short discussion, from Stanford's CCRMA website:
https://ccrma.stanford.edu/~bilbao/booktop/node9.html
I used this method to make a Java "Theremin"
http://www.hexara.com/VSL/JTheremin.htm
With a WaveTable, you decide on the size of the array. If it is a power of 2, one can bitmask the pointer after every increment, which is faster than doing a comparison and reset.
My aim is code a project which records human sound and changes it (with effects).
e.g : a person will record its sound over microphone (speak for a while) and than the program makes its like a baby sound.
This shall run effectively and fast (while recording the altering operation must run, too)
What is the optimum way to do it ?
Thanks
If you're looking for either XNA or DirectX to do this for you, I'm pretty sure you're going to be out of luck (I don't have much experience with DirectSound; maybe somebody can correct me). What it sounds like you want to do is realtime digital signal processing, which means that you're either going to need to write your own code to manipulate the raw waveform, or find somebody else who's already written the code for you.
If you don't have experience writing this sort of thing, it's probably best to use somebody else's signal processing library, because this sort of thing can quickly get complicated. Since you're developing for the PC, you're in luck; you can use any library you like using P/Invoke. You might try out some of the solutions suggested here and here.
MSDN has some info about the Audio namespace from XNA, and the audio recording introduced in version 4:
Working with Microphones
Recording Audio from a Microphone
Keep in mind that recorded data is returned in PCM format.
I just saw an iPhone app which uses wavetables to generate sounds. I wish to know how it is possible to implement.
I am pretty much sure that core audio have to be used, but any other idea where to go for some other info will be appreciated.
You'll want CoreAudio or AudioUnits for a responsive program (e.g. AudioQueue's latency is a bit high).
You'll want AudioFile APIs (in AudioToolbox) for reading the tables if you save them as a common audio file format (just wave files with a new shape every cycle, which is every N samples).
Beyond that, you'll probably have to write the wavetable engine. I have done that; It's not tough if you know how wavetable synthesis works and are familiar with audio signals. It's one of the most basic synthesis types.
musicdsp.org may have something you can use as a starting point for this.
After huge investigating I have found an open source project regarding this. http://gitorious.org/pdlib/
Audio file I/O: I found a great resource here. This guy created an excellent API for using ExtAudioFileServices.
A must read is Learning Core Audio. Chris Adamson and company have really put together a great resource. Chris's blog can also be found here
Also, sign up for the Core Audio mailing list.
Michael Tyson's blog/ resources are great too A Tasty Pixel.
Hope this helps!
Take a look at this tutorial on how to use the STK: http://arielelkin.github.io/articles/mandolin/
It is an open-source C++ library with cool synths, some with wavetables.
I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch
If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.
I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.
Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.
http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.