I understand so far that AKSampler was recently rewritten and this GitHub project seems to be the defacto guide on the new AKSampler. What I can gather is a move toward SFZ format. I am new to the sampling world but in my application I only need a handful of samples recorded from my piano in order for it to work. As I have looked around with existing SFZ formats and samples, I do not need all of the complexity and features that SFZ provides.
I am currently using AKSampler with a single piano sample which works perfectly, however it gets a bit weird once I play anything too far from the original source, so I just want to fill in the gaps with a few other samples (I only need to play around an octave and a half with my current app).
I do see according to the Docs a couple methods buildSimpleKeyMap() and buildKeyMap() however there is no implementation currently.
Do I have any additional options? I know that EXS format has been deprecated, as well as SoundFont. Is the only way to map multiple samples to AKSampler currently using SFZ?
Thanks for all your help <3
Edit: This readme on the AKSampler GitHub page provides the breakdown for samples. I still only see SFZ being considered. If anyone else is lost with my question or needs a reference, this seems to be the best resource. If the current AKSampler only offers SFZ as the primary way to map multiple samples, so be it, however it does look very challenging, I'm really hoping there is some simple middle ground between only using a single sample for the AKSampler vs. a full bore SFZ file.
Edit 2: Getting a solution to this, will update as soon as possible, thanks for your patience!
I have provided a simple explainer and sample file in the AudioKit docs. Hope this helps new users of AudioKit!
Related
over the past week or so I’ve been trying to find a workflow for original sample generation to eventually be used in an audiokit app, obviously at some point Im going to have to decide which of AudioKits sampler classes best fits my needs and I have a few
Questions about what I have found so far :
I won’t be getting Logic Pro anytime soon so I think EXS24 is not an option right now, as a side question - are there any other
Apps which can generate .esx24 files?
The MIDISampler in AudioKit has a method .loadSoundFont( - is there a documented example of this? (couldn’t find any in cookbook)
I was able to get this working with .sf2 files but I want to make sure Im doing it properly.
There is a Sampler class in DunneAudioKit - I was also able to establish a workflow in this using .sfz files - this seems pretty good
Currently I am deliberating between the MIDISampler in AudioKit and the Sampler in DunneAudioKit
Are there any other options in the audiokit framework(s) that I should consider? Also the file formats .sf2 and .sfz I am guessing are probably going to stay in use for some time - has anybody heard about either of those two being deprecated or changed in a major way?
GarageBand can make AUPreset files and MainStage can make EXS24 files suitable for AppleSampler. I've not tried the DunneSampler but I believe that is the one AudioKit Pro uses for several of their commercial apps. There isn't an example in the Cookbook for SF2 in AppleSampler but it should be very similar. SF2 has been around for a very long time and I imagine SFZ will be too.
Hope this question makes some sense, I'm completely lost....
In my proto-app I'm recording micro input and saving it, and so far no problems at all.
I now need to access the buffer while I'm recording it in order to pass chunks of data to another class (written in C, not by me) that will do some analysis.
I spent the whole day browsing and reading, and looks like I need use Audio Queues in order to access the buffer.
The problem is that the syntax is C, and I don't understand it at all :)
So my questions are:
1) Is there any other way to achieve what I'm looking for? I don't need in-depth explanation, just some hints and I will browse my way through :) I'm asking because I'm not 100% sure that Audio Queues are the only way to go
2) Any good tutorial or example about Audio Queues? The aurioTouch tutorial by Apple wasn't very useful (again, I don't know C). I could bypass my problems in C by following a good tutorial that a noob like me can understand
Thanks a lot, and for any help you could offer.
Good question.
You can use code written by other people like:
Novocaine - pretty straightforward. (but there are some bugs, at least in older version I used ~ 6 months ago. Something with mono and stereo.)
Momu - quite a good thing in C++ (you need to use .mm extension for you files)
Those will save you time if you want some low level audio programming. Some basic skills in C still required though. Check out this guy. His explanations and enthusiasm are excellent.
With all mentioned above you can be ready in a 1-2 days of work carrying away good skills in C.
EDIT
Basically, everywhere you work with low-level audio you deal with a C array of numbers (represented like float *audioBuffer;) called audio samples. You cycle through it in a loop, do some operations, copy it, send somewhere, analyze.
To copy it you have to allocate space for it. Actual byte size of the buffer can be calculated like this: numberOfSamples*sizeof(type).
I've been searching for some examples that show how to do ADSR in iOS using audio samples (preferably WAV files with loop points, but thats secondary). I guess most people who write a sampler/synth app use audio unit for this. Does any one know a good code example that shows ADSR in any iOS audio library?
In the new iOS SDK 5.0 there's now a Sampler Audio Unit! Which can do ADSR envelopes.
The presets demo shows how to use the sampler:
http://developer.apple.com/library/ios/#samplecode/LoadPresetDemo/Introduction/Intro.html#//apple_ref/doc/uid/DTS40011214
If you want to load different sound formats to play this article is helpful:
https://developer.apple.com/library/mac/#technotes/tn2283/_index.html
And here's the iOS documentation reference:
http://developer.apple.com/library/ios/#documentation/AudioUnit/Reference/AUComponentServicesReference/Reference/reference.html#//apple_ref/doc/uid/TP40007291
you can find (a very basic) one in the Apple's SinSynth sample. That is an AU, but it should demonstrate how one would apply a envelope to an audio buffer. i don't remember - it may simply be an ASR, but adding a fourth stage is simple once you have understood the existing program. The implementation is right in the note's render.
Envelope Generators are not platform specific.
musicdsp.org will be a better resource if you want more than a push in the right direction.
MusicDSP has source code for an example envelope follower with attack/release. If you understand this, then sustain/decay should be pretty logical. ;)
But an ADSR envelope is basically just a matter of applying gain to your output signal with a state machine. Each state has a starting value, and ending value, and a duration. Calculating the slope of that line and the value of each point along it was covered in your algebra class back in high school. ;) If you want to be really fancy, you can implement other types of curves, but the concept remains the same.
I just saw an iPhone app which uses wavetables to generate sounds. I wish to know how it is possible to implement.
I am pretty much sure that core audio have to be used, but any other idea where to go for some other info will be appreciated.
You'll want CoreAudio or AudioUnits for a responsive program (e.g. AudioQueue's latency is a bit high).
You'll want AudioFile APIs (in AudioToolbox) for reading the tables if you save them as a common audio file format (just wave files with a new shape every cycle, which is every N samples).
Beyond that, you'll probably have to write the wavetable engine. I have done that; It's not tough if you know how wavetable synthesis works and are familiar with audio signals. It's one of the most basic synthesis types.
musicdsp.org may have something you can use as a starting point for this.
After huge investigating I have found an open source project regarding this. http://gitorious.org/pdlib/
Audio file I/O: I found a great resource here. This guy created an excellent API for using ExtAudioFileServices.
A must read is Learning Core Audio. Chris Adamson and company have really put together a great resource. Chris's blog can also be found here
Also, sign up for the Core Audio mailing list.
Michael Tyson's blog/ resources are great too A Tasty Pixel.
Hope this helps!
Take a look at this tutorial on how to use the STK: http://arielelkin.github.io/articles/mandolin/
It is an open-source C++ library with cool synths, some with wavetables.
I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch
If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.
I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.
Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.
http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.