I'm trying to create an 'auto dj' application that would let smartphone users select a playlist of songs, and it would create a seamless mix for playback. There are a couple factors involved in this: read a playlist of audio files, calculate their waveforms/spectrums, determine the BPMs, and organize the compatible songs in a new playlist in the order that they will be played (based on compatible tempos & keys).
The app would have to be able to scan the waveform of a song and recognize the beginning of the 'main' part of the song (skipping slow intros/outros). I also imagine having some effects: filtering, so it can filter the bass out of the new track being mixed in, and switch the basses at an appropriate time. Perhaps reverb that the user could control as well.
I am just seeing how feasible of a project this is for 3-4 busy college students in the span of ~4 months. Not sure if it would be an Android or iOS app, or perhaps even a Windows app. Not sure what language we would use (likely Python or Java); whichever has the most useful audio analyzing libraries. Obviously it would work better for certain genres of music (house, trance), but I'd still really like to try to create this.
Thanks for any feedback
As much as I would like to hear a more experienced person's opinion on this, I would say based on your situation that it would be a very big undertaking. Since it sounds like you don't have experience using audio analyzing libraries/ programs you might want to start experimenting with those and most of them are likely going to be in C/ C++, not java/ Python. Here are some I know of but I would recommend do your own research.
http://www.underbit.com/products/mad/
http://audacity.sourceforge.net/
It doesn't sound that feasible in your situation but that just depends on your programming/project experience and motivation to create it.
Good luck
Related
Working in audio kit and I am looking to understand how people have incorporated drums. Obviously, the sampler is an option, but I am wondering if there is a built in option similar to some of the basic synthesis options.
There are a few options. I personally like the AppleSampler/MidiSampler like in the example but instead of using audio files you can create a EXS Sampler instrument in Logic where you can assign notes for different velocities. AppleSampler can also load AUPresets made in GarageBand and SoundFonts (SF2). The DunneAudioKit Sampler is an option if you are working with SFZ files, but I think that might be a work-in-progress in AudioKit 5. Loading WAV files directly into AppleSampler is also a good option if you just want one shot sounds.
I'm assuming you're mostly talking about playback of samples, not recording.
The best built-in option I've seen (other than AppleSampler/MidiSampler) is AudioPlayer, which lets you load in a sample and play it back on demand (from an on-screen pad, etc). MIDIListener can then help you respond to external MIDI events, etc. It works (I have a pretty big branch in my app where I tried it), but not sure it works well.
I wouldn't recommend DunneAudioKit Sampler for drums. There is no one-shot playback (so playing the same note in quick succession will cut off the previous note, even if you mess with the release). If you're trying to build a complex/realistic acoustic drum instrument, you'll also want round-robins so that variations of the same hit can be played, which Dunne also doesn't have. It can load SFZ files, but only a very limited subset of SFZ's opcodes (so again, it's missing things like round robins, mute groups, one-shot, etc).
Having gone down all those roads, I would suggest starting with AppleSampler, and I would build the EXS or aupreset file in Logic or Mainstage rather than trying to build something programmatically.
If your needs are really simple, the examples in AudioKit's recently released drum pad playground is a great place to start, loading single samples into a specific note on AppleSampler.
I am trying to build an app that allows the user to record individual people speaking, and then save the recordings on the device and tag each record with the name of the person who spoke. Then there is the detection mode, in which i record someone and can tell whats his name if he is in the local database.
First of all - is this possible at all? I am very new to iOS development and not so familiar with the available APIs.
More importantly, which API should I use (ideally free) to correlate between the incoming voice and the records I have in the local db? This should behave something like Shazam, but much more simple since the database I am looking for a match against is much smaller.
If you're new to iOS development, I'd start with the core app to record the audio and let people manually choose a profile/name to attach it to and worry about the speaker recognition part later.
You obviously have two options for the recognition side of things: You can either tie in someone else's speech authentication/speaker recognition library (which will probably be in C or C++), or you can try to write your own.
How many people are going to use your app? You might be able to create something basic yourself: If it's the difference between a man and a woman you could probably figure that out by doing an FFT spectral analysis of the audio and figure out where the frequency peaks are. Obviously the frequencies used to enunciate different phonemes are going to vary somewhat, so solving the general case for two people who sound fairly similar is probably hard. You'll need to train the system with a bunch of text and build some kind of model of frequency distributions. You could try to do clustering or something, but you're going to run into a fair bit of maths fairly quickly (gaussian mixture models, et al). There are libraries/projects that'll do this. You might be able to port this from matlab, for example: https://github.com/codyaray/speaker-recognition
If you want to take something off-the-shelf, I'd go with a straight C library like mistral, as it should be relatively easy to call into from Objective-C.
The SpeakHere sample code should get you started for audio recording and playback.
Also, it may well take longer for the user to train your app to recognise them than it's worth in time-saving from just picking their name from a list. Unless you're intending their voice to be some kind of security passport type thing, it might just not be worth bothering with.
I have a question about writing a script which can manage to play online games in different codes. I think the easiest to understand is when I say I need to make a platform on which Playstation as xbox players are allowed to play online Modern Warfare 3 together.
Mathematically it seems it is possible: at the end you have two different screens which project the same. On the platform, Sony and Microsoft players stream their code or screen to the platform and play together. Big problem is that you get it delivered in 2 different codes which you have to translate to one language in less than 0,001 second.
Honestly said I have to get into this stuff but I cannot get much further.
Do you have any tips, other forums or solutions for this problem? Maybe it is writing a new language? (Google is technically using it for Google-translating over the phone)
Depending on the game this might not be possible even in theory. Many console games use a peer-to-peer lock-step synchronization model for multiplayer. Games that use this approach only send each other the player input from the other consoles and rely on deterministic simulation (the same inputs produce the same outputs) to keep the systems synchronized.
This only works when the exact same compiled code is running on the same CPU for all players. Games with this networking model usually have periodic desynch checks to make sure that the different systems haven't drifted out of sync with each other. A desynch failure is usually considered a fatal error and either a bug in the game or evidence of attempted cheating by one of the players.
Other multiplayer games use a client server model and so it would be possible in theory to allow different consoles to play against each other. Reverse engineering the network protocol would be a formidable technical challenge however and it would be a difficult problem to get this to work reliably.
Even if you could solve the technical problems though you would likely have even bigger legal issues to overcome. Sony and Microsoft don't want to allow cross platform play so even though it would be possible in theory to make this work with a client server multiplayer game developers aren't able to implement it. A third party trying to make this work would likely have to deal with legal challenges from Microsoft, Sony and the game developer.
I've read quite a bit both here (Audio Framework in iPhone) and abroad but am still confused as to which Audio Framework to use.
I'm able to get some easier things done, like recording and playing back but I'm looking to the future of the app where I'll be doing more complex things, like managing past recordings (although maybe that's a NSURL bookmark thing) and editing audio.
Right now I'm using AVFoundation but have started reading the docs for Core Audio (and there's also AudioToolbox). I wish there was a developer doc called "Understanding the Different Audio Frameworks and How and When to use them" because, well, the docs are dense and I'm having trouble figuring out which path to go down.
Links to good docs would also be much appreciated!
I recommend you take a look at the recent Learning Core Audio book. The purpose of it was to disambiguate the confusion around audio frameworks on Mac OS and iOS. If you want "good docs", it's well worth getting.
Depending on your requirements, you might also want to consider some of the non-Apple audio frameworks, particularly the MoMu release of STK, which in may respects will be simpler and easier-to-use than Apple's frameworks.
I'm developing a virtual instrument app for iOS and am trying to implement a recording function so that the app can record and playback the music the user makes with the instrument. I'm currently using the CocosDenshion sound engine (with a few of my own hacks involving fades etc) which is based on OpenAL. From my research on the net it seems I have two options:
Keep a record of the user's inputs (ie. which notes were played at what volume) so that the app can recreate the sound (but this cannot be shared/emailed).
Hack my own low-level sound engine using AudioUnits & specifically RemoteIO so that I manually mix all the sounds and populate the final output buffer by hand and hence can save said buffer to a file. This will be able to be shared by email etc.
I have implemented a RemoteIO callback for rendering the output buffer in the hope that it would give me previously played data in the buffer but alas the buffer is always all 00.
So my question is: is there an easier way to sniff/listen to what my app is sending to the speakers than my option 2 above?
Thanks in advance for your help!
I think you should use remoteIO, I had a similar project several months ago and wanted to avoid remoteIO and audio units as much as possible, but in the end, after I wrote tons of code and read lots of documentations from third party libraries (including cocosdenshion) I end up using audio units anyway. More than that, it's not that hard to set up and work with. If you however look for a library to do most of the work for you, you should look for one written a top of core audio not open al.
You might want to take a look at the AudioCopy framework. It does a lot of what you seem to be looking for, and will save you from potentially reinventing some wheels.