I am having trouble finding how to read frequencies from audio input. I am trying to listen to very high frequencies (ultrasonic). I've explored several GitHub projects which all were either outdated or malfunctional.
I discovered this guide, but I am having trouble understanding it. https://developer.apple.com/documentation/accelerate/finding_the_component_frequencies_in_a_composite_sine_wave Can anyone provide guidance; has anyone done this before? Thanks
It's worth digging into this piece of sample code: https://developer.apple.com/documentation/accelerate/visualizing_sound_as_an_audio_spectrogram
The sample calculates the Nyquist frequency of the microphone - for example your device might have a maximum frequency of 20KHz. You can look at the values in each frequency domain page of samples and find the maximum value to derive the dominant frequency.
Related
My friend Prasad Raghavendra and me, were trying to experiment with Machine Learning on audio.
We were doing it to learn and to explore interesting possibilities at any upcoming get-togethers.
I decided to see how deep learning or any machine learning can be fed with certain audios rated by humans (evaluation).
To our dismay, we found that the problem had to be split to accommodate for the dimensionality of input.
So, we decided to discard vocals and assess by accompaniments with an assumption that vocals and instruments are always correlated.
We tried to look for mp3/wav to MIDI converter. Unfortunately, they were only for single instruments on SourceForge and Github and other options are paid options. (Ableton Live, Fruity Loops etc.) We decided to take this as a sub-problem.
We thought of FFT, band-pass filters and moving window to accommodate for these.
But, we are not understanding as to how we can go about splitting instruments if chords are played and there are 5-6 instruments in file.
What are the algorithms that I can look for?
My friend knows to play Keyboard. So, I will be able to get MIDI data. But, are there any data-sets meant for this?
How many instruments can these algorithms detect?
How do we split the audio? We do not have multiple audios or the mixing matrix
We were also thinking about finding out the patterns of accompaniments and using those accompaniments in real-time while singing along. I guess we will be able to think about it once we get answers to 1,2,3 and 4. (We are thinking about both Chord progressions and Markovian dynamics)
Thanks for all help!
P.S.: We also tried FFT and we are able to see some harmonics. Is it due to Sinc() in fft when rectangular wave is input in time domain? Can that be used to determine timbre?
We were able to formulate the problem roughly. But, still, we are finding it difficult to formulate the problem. If we use frequency domain for certain frequency, then the instruments are indistinguishable. A trombone playing at 440 Hz or a Guitar playing at 440 Hz would have same frequency excepting timbre. We still do not know how we can determine timbre. We decided to go by time domain by considering notes. If a note exceeds a certain octave, we would use that as a separate dimension +1 for next octave, 0 for current octave and -1 for the previous octave.
If notes are represented by letters such as 'A', 'B', 'C' etc, then the problem reduces to mixing matrices.
O = MI during training.
M is the mixing matrix that will have to be found out using the known O output and I input of MIDI file.
During prediction though, M must be replaced by a probability matrix P which would be generated using previous M matrices.
The problem reduces to Ipredicted = P-1O. The error would then be reduced to LMSE of I. We can use DNN to adjust P using back-propagation.
But, in this approach, we assume that the notes 'A','B','C' etc are known. How do we detect them instantaneously or in small duration like 0.1 seconds? Because, template matching may not work due to harmonics. Any suggestions would be much appreciated.
Splitting out the different parts is a machine learning problem all to its own. Unfortunately, you can't look at this problem in audio land only. You must consider the music.
You need to train something to understand musical patterns and progressions in the context of the type of music you give it. It needs to understand what the different instruments sound like, both mixed and not mixed. It needs to understand how these instruments are often played together, if it's going to have any chance at all at separating what's going on.
This is a very, very difficult problem.
This is a very hard problem mainly because converting audio to pitch isnt very simple due to Nyquist folding harmonics that are 22Khz+ back down and also other harmonic introductions such as saturators/distortion and other analogue equipment that introduce harmonics.
The fundamental harmonic isnt always the loudest which is why your plan will not work.
The hardest thing to measure would be a distorted guitar. The harmonic some pedals/plugins can make is crazy.
OK,
let me try and rephrase this:
I'm looking for a method, that takes an audio file as an input, and outputs a list of transients (distinctive peak), based upon a given sensitivity.
The audio is a recording of a spoken phrase of for example 5 words. The method would return a list of numbers (e.g. amount of samples or milliseconds) where the words start. My ultimate goal is to play each word individually.
As suggested in a comment (I really struck some negative chord here) I am NOT asking anyone to write any code for me.
I've been around on this forum a while now, and the community has always been very helpful. The most helpful answers were those that pointed out my rigid way of thinking, offering surprising alternatives or work arounds, based upon their own experiences.
I guess this topic is just too much of a niche.
Before Edit:
For my iOS app, I need to programmatically cut up a spoken phrase into words for further processing. I know what words to expect, so, I can make some assumptions on where words would start.
However, in any case, a transient detection algorithm/method would be very helpful.
Google points me to either commercial products, or highly academic papers that are beyond my brain power.
Luckily, you are much smarter and knowledgeable than me, so you can help and simplify my problems.
Don't let me down!
There are a couple of simple basic ideas you can put to work here.
First, take the input audio and divide into small sized buckets (on the order of 10's on milliseconds). For each bucket compute the power of the samples in it by summing the squares of each sample value.
For example say you had 16 bit samples at 44.1 kHz, in array called s. One second's worth of data would be 44100 samples. A 10 msec bucket size would give you 441 samples per bucket. To compute the power you could do this:
float power = 0;
for (int i = 0; i < 441; i++) {
float normalized = (float)s[i] / 32768.0f;
power = power + (normalized * normalized);
}
Once you build an array of power values, you can look at relative changes in power from bucket to bucket to do basic signal detection.
Good luck!
Audio analysis is a very complex topic. You could easily detect individual words and slice them apart, but actually identifying them requires a lot of processing and advanced algorythms.
Sadly, there is not much we can tell you besides that there is no way around it. You said you found commercial products and I would suggest going for those. Papers are not always complete enough or right for the language/platform/usecase you want, and often lack details for proper implementation for someone without prior knowledge of the topic.
You may be lucky and find an open source implementation that suits your needs. Here's what a little bit of research returned:
How to use Speech Recognition inside the iOS SDK?
free speech recognition engines for iOS?
You'll quickly see speech recognition is not something you should start from scratch. Choose a library, try it for a little bit and see if it works!
I am trying to create an iOS app that will perform an action when it detects a clapping sound.
Things I've tried:
1) My first approach was to simply measure the overall power using an AVAudioRecorder. This worked OK but it could get set off by talking too loud, other noises, etc so I decided to take a different approach.
2) I then implemented some code that uses a FFT to get the frequency and magnitude of the live streaming audio from the microphone. I found that the clap spike generally resides in the 13kHZ-20kHZ range while most talking resides in a lot lower frequencies. I then implemented a simple thresh-hold in this frequency range, and this worked OK, but other sounds could set it off. For example, dropping a pencil on the table right next to my phone would pass this thresh-hold and be counted as a clap.
3) I then tried splitting this frequency range up into a couple hundred bins and then getting enough data where when a sound passed that thresh-hold my app would calculate the Z-Score (probability from statistics) and if the Z-Score was good, then could that as a clap. This did not work at all as some claps were not recognized and some other sounds were recognized.
Graph:
To try to help me understand how to detect claps, I made this graph in Excel (each graph has around 800 data points) and it covers the 13kHZ-21kHZ range:
Where I am now:
Even after all of this, I am still not seeing how to recognize a clap versus other sounds.
Any help is greatly appreciated!
I used this tutorial to create a small pitch detection app, however I'd like it to recognize the loudest instead of the highest pitch (within a certain frequency range).
I'd therefore need to get the amplitude of the current pitch to create a new bin-filter...
Any thoughts on how I could realize this?
As this is already using core audio (remoteIO) based samples, it seems unreasonable to use the regularly suggested AVAudioPlayer...
Any help is very much appreciated! Thank you guys!!
I want to record user's voice and conduct FFT on it so that I can get some frequency values and calculate the highest tone of that recording. Has anybody done anything related to it in BlackBerry. It would be great if I can get some help regarding this
Check out my Google Code Project for real time FFT computation. You should be able to modify the code to work for you.