Getting amplitude of current pitch using core audio - ios

I used this tutorial to create a small pitch detection app, however I'd like it to recognize the loudest instead of the highest pitch (within a certain frequency range).
I'd therefore need to get the amplitude of the current pitch to create a new bin-filter...
Any thoughts on how I could realize this?
As this is already using core audio (remoteIO) based samples, it seems unreasonable to use the regularly suggested AVAudioPlayer...
Any help is very much appreciated! Thank you guys!!

Related

Working on a small project dealing with motion detection and sending alerts if displacement exceeds certain boundary.is it possible to implement yolo?

This is the first mini project I am working on and I have been searching for some information regarding yolo. I want to know if we could train yolo to recognise objects in a real time webcam and set up a boundary (not to be confused with the boundary boxes) that sends out a simple alert if the so called object (in our case, a face) goes out of the boundary.
This is my first time asking here and I don't know if it is appropriate to do so. please let me know and I will be reading APIs related to motion detection. If there are any suggestions, please do give them.
I would check out this open source CCTV solution called https://shinobi.video/. I used it once to do motion detection and I think it could be much easier for you than building something from scratch.
Here are some articles they have that sound related to what you are trying to do:
https://hub.shinobi.video/articles/view/JtJiGkdbcpAig40
https://hub.shinobi.video/articles/view/xEMps3O4y4VEaYk

How to read audio frequency from microphone swift (ultrasonic)?

I am having trouble finding how to read frequencies from audio input. I am trying to listen to very high frequencies (ultrasonic). I've explored several GitHub projects which all were either outdated or malfunctional.
I discovered this guide, but I am having trouble understanding it. https://developer.apple.com/documentation/accelerate/finding_the_component_frequencies_in_a_composite_sine_wave Can anyone provide guidance; has anyone done this before? Thanks
It's worth digging into this piece of sample code: https://developer.apple.com/documentation/accelerate/visualizing_sound_as_an_audio_spectrogram
The sample calculates the Nyquist frequency of the microphone - for example your device might have a maximum frequency of 20KHz. You can look at the values in each frequency domain page of samples and find the maximum value to derive the dominant frequency.

Recognize sound based on recorded library of sounds

I am trying to create an iOS app that will perform an action when it detects a clapping sound.
Things I've tried:
1) My first approach was to simply measure the overall power using an AVAudioRecorder. This worked OK but it could get set off by talking too loud, other noises, etc so I decided to take a different approach.
2) I then implemented some code that uses a FFT to get the frequency and magnitude of the live streaming audio from the microphone. I found that the clap spike generally resides in the 13kHZ-20kHZ range while most talking resides in a lot lower frequencies. I then implemented a simple thresh-hold in this frequency range, and this worked OK, but other sounds could set it off. For example, dropping a pencil on the table right next to my phone would pass this thresh-hold and be counted as a clap.
3) I then tried splitting this frequency range up into a couple hundred bins and then getting enough data where when a sound passed that thresh-hold my app would calculate the Z-Score (probability from statistics) and if the Z-Score was good, then could that as a clap. This did not work at all as some claps were not recognized and some other sounds were recognized.
Graph:
To try to help me understand how to detect claps, I made this graph in Excel (each graph has around 800 data points) and it covers the 13kHZ-21kHZ range:
Where I am now:
Even after all of this, I am still not seeing how to recognize a clap versus other sounds.
Any help is greatly appreciated!

calculating highest frequency of recorded voice in Blackberry

I want to record user's voice and conduct FFT on it so that I can get some frequency values and calculate the highest tone of that recording. Has anybody done anything related to it in BlackBerry. It would be great if I can get some help regarding this
Check out my Google Code Project for real time FFT computation. You should be able to modify the code to work for you.

Recognizing individual voices

I plan to write a conversation analysis software, which will recognize the individual speakers, their pitch and intensity. Pitch and intensity are somewhat straightforward (pitch via autocorrelation).
How would I go about recognizing individual speakers, so I can record his/her features? Will storing some heuristics for each speaker's frequencies be enough? I can assume that only one person speaks at a time (strictly non-overlapping). I can also assume that for training, each speaker can record a minute's worth of data before actual analysis.
Pitch and intensity on their own tell you nothing. You really need to analyse how pitch varies. In order to identify different speakers you need to transform the speech audio into some kind of feature space, and then make comparisons against your database of speakers in this feature space. The general term that you might want to Google for is prosody - see e.g. http://en.wikipedia.org/wiki/Prosody_(linguistics). While you're Googling you might also want to read up on speaker identification aka speaker recognition, see e.g. http://en.wikipedia.org/wiki/Speaker_identification
If you are still working on this... are you using speech-recognition on the sound input? Because Microsoft SAPI for example provides the application with a rich API for digging into the speech sound wave, which could make the speaker-recognition problem more tractable. I think you can get phoneme positions within the waveform. That would let you do power-spectrum analysis of vowels, for example, which could be used to generate features to distinguish speakers. (Before anybody starts muttering about pitch and volume, keep in mind that the formant curves come from vocal-tract shape and are fairly independent of pitch, which is vocal-cord frequency, and the relative position and relative amplitude of formants are (relatively!) independent of overall volume.) Phoneme duration in-context might also be a useful feature. Energy distribution during 'n' sounds could provide a 'nasality' feature. And so on. Just a thought. I expect to be working in this area myself.

Resources