How to detect claps in iOS? - ios

I am working on making an app that performs an action when the sound of a clap is recognized. I have looked into simply measuring the average and peak power from an AVAudioRecorder and this works okay, but if there are other sounds then it reports lots of false positives. I believe I need some kind of audio fingerprinting for this to work while other audio is playing. Now I know that this has been asked a lot before on SO, but most of the answers say something along the lines of "Use FFT" and then the person says "Oh okay!" but no clear explanation is given and I still have no idea how to correctly identify sounds using an FFT.
Can anyone clearly explain, cite another tutorial, or post a link to a library that can identify sounds using audio fingerprinting?
Thanks!

Related

Robot framework, how to compare sound, video file

I have sound, video source file and I have to verify my program which open and play this file is work correctly.
I don't know how to verify file like this!
I think i should capture (sound/video) and then compare it to source file.
Till this time, I've searched on the internet but didn't get any solution.
This is going to be a real challange for you, I personally have never done this but hopefully I can provide you with some help to set you on your way...
First you need to know that robotframework is run on python so anything you will need to be in python or have python bindings so asking there may be a good start.
In terms of capturing sound I believe it would be eaiser to use a program with a api you can use, I found a document here of someone doing this, as to whether this is still correct I am not sure:
http://www.nektra.com/files/DirectSound_Capture_With_Deviare.pdf
For video capture try looking here:
https://www.youtube.com/watch?v=j344j34JBRs
Next would be stripping the video, seperating the audio and video frames and comparing them seperatly. For this you are going to need a video editor, audio comparison library and a tool for comparing images.
In terms of how this would work I dont know as I have never done this...
Why do you need to do this tho, is there not a better way of doing this? Does you application make the video? In which case could just doing some checks on frames, length, file size suffice? You need to provide for information.
This is a bit long for a comment but this answer is incomplete.
Let me know how you get on?

Sound Recognition with iOS 7?

I want to build an app that responds to the sound you make when blowing out birthday candles. This is not speech recognition per se (that sound isn't a word in English), and the very kind Halle over at OpenEars told me that it's not possible using that framework. (Thanks for your quick response, Halle!)
Is there a way to "teach" an app a sound such that the app can subsequently recognize it?
How would I go about this? Is it even doable? Am I crazy or taking on a problem that is much more difficult than I think it is? What should my homework be?
The good news is that it's achievable and you don't need any third party frameworks—AVFoundation is all you really need.
There's a good article from Mobile Orchard that covers the details, but somewhat inevitably for a four year old, there's some gotchas you need to be aware of.
Before you begin recording on a real device, I had need to set the audio session category, like so:
[[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
Play around with the threshold in this line:
if (lowPassResults > 0.95)
I found 0.95 to be too high and got better results setting it somewhere between 0.55 and 0.75. Similarly, I played around with the 0.05 multiplier in this line:
double peakPowerForChannel = pow(10, (0.05 * [recorder peakPowerForChannel:0]));
Using simple thresholds on energy levels would probably not be robust enough for your use case.
A good way to go about this would be to first extract some properties from the sound stream that are specific to the sound of blowing out candles. Then use a machine learning algorithm to train a model based on training examples (a set of recordings of the sound you want to recognize), which can then be used to classify snippets of sound coming into your microphone in real-time when using the application.
Given the possible environmental sounds going on while you blow out candles (birthdays are always noisy, aren't they?), it may be difficult to train a model that is robust enough to these background sounds. This is not a simple problem if you care about accuracy.
It may be doable though:
Forgive me the self-promotion, but my company developed an SDK that provides an answer to the question you are asking: "Is there a way to "teach" an app a sound such that the app can subsequently recognize it?"
I am not sure if the specific sound of blowing out candles would work, as the SDK was primarily aimed at applications involving somewhat percussive sounds, but it might still work for your case. Here is a link, where you will also find a demo program you can download and try if you like: SampleSumo PSR SDK

Finding a single frequency in an sound file xcode

For an app I'm working on a i need to be able to 'search' a sound file to find a particular frequency.
Basically, the iPhone mic records for 5 seconds writes to an a lossless music file. I then need to 'open' that file and search for a particular frequency. The frequency is very particular (eg not between 15hz and 300hz, its is a fixed number).
Also, as soon the the frequency is found the search can stop.
I've got the iPhone recording the sound and writing to the file, I am just unsure how to open that file and search for the frequency.
Any help would be greatly appreciated!
Thanks
Checkout PitchDetector and the related tutorial.
Can't get you any closer :)
If you're just looking for a specific, fixed, pure tone then the simplest method is probably Goertzel's algorithm - it's very simple to implement and is relatively lightweight computationally compared to using e.g. an FFT or autocorrelation method.

XNA | C# : Record and Change the Voice

My aim is code a project which records human sound and changes it (with effects).
e.g : a person will record its sound over microphone (speak for a while) and than the program makes its like a baby sound.
This shall run effectively and fast (while recording the altering operation must run, too)
What is the optimum way to do it ?
Thanks
If you're looking for either XNA or DirectX to do this for you, I'm pretty sure you're going to be out of luck (I don't have much experience with DirectSound; maybe somebody can correct me). What it sounds like you want to do is realtime digital signal processing, which means that you're either going to need to write your own code to manipulate the raw waveform, or find somebody else who's already written the code for you.
If you don't have experience writing this sort of thing, it's probably best to use somebody else's signal processing library, because this sort of thing can quickly get complicated. Since you're developing for the PC, you're in luck; you can use any library you like using P/Invoke. You might try out some of the solutions suggested here and here.
MSDN has some info about the Audio namespace from XNA, and the audio recording introduced in version 4:
Working with Microphones
Recording Audio from a Microphone
Keep in mind that recorded data is returned in PCM format.

Virtual Instrument App Recording Functionality With RemoteIO

I'm developing a virtual instrument app for iOS and am trying to implement a recording function so that the app can record and playback the music the user makes with the instrument. I'm currently using the CocosDenshion sound engine (with a few of my own hacks involving fades etc) which is based on OpenAL. From my research on the net it seems I have two options:
Keep a record of the user's inputs (ie. which notes were played at what volume) so that the app can recreate the sound (but this cannot be shared/emailed).
Hack my own low-level sound engine using AudioUnits & specifically RemoteIO so that I manually mix all the sounds and populate the final output buffer by hand and hence can save said buffer to a file. This will be able to be shared by email etc.
I have implemented a RemoteIO callback for rendering the output buffer in the hope that it would give me previously played data in the buffer but alas the buffer is always all 00.
So my question is: is there an easier way to sniff/listen to what my app is sending to the speakers than my option 2 above?
Thanks in advance for your help!
I think you should use remoteIO, I had a similar project several months ago and wanted to avoid remoteIO and audio units as much as possible, but in the end, after I wrote tons of code and read lots of documentations from third party libraries (including cocosdenshion) I end up using audio units anyway. More than that, it's not that hard to set up and work with. If you however look for a library to do most of the work for you, you should look for one written a top of core audio not open al.
You might want to take a look at the AudioCopy framework. It does a lot of what you seem to be looking for, and will save you from potentially reinventing some wheels.

Resources