Comparing pitches with digital audio - signal-processing

I work on application which will compare musical notes with digital audio. My first idea was analyzes wav file (or sound in real-time) with some polyphonic pitch algorithms and gets notes and chords from this file and subsequently compared with notes in dataset. I went through a lot of pages and it seems to be a lot of hard work because existing implementations and algorithms are mainly/only focus on monophonic sound.
Now, I got the idea to do this in the opposite way. In dataset I have for example note: A4 or better example chord: A4 B4 H4. And my idea is make some wave (or whatever I don't know what) from this note or chord and then compared with piece of digital audio.
Is this good idea? Is it better/harder solution?
If yes can you recommend me how to do it?

The easiest solution is to take the FFT (Fast Fourier Transform) of the waveform: all the notes (and their harmonics) will be present in the signal. You then look for the frequencies that correspond to notes, and there's your solution.
Note - in order to get decent frequency resolution you need a sufficiently long sample, and high enough sample rate. But try it and you will see.
Here are a couple of screen shots of an app called SpectraWave that I took sitting in front of my piano. The first is of middle A (f = 440 Hz as you know):
and the second is of an A-minor chord (as you can see, my middle finger is a little stronger and the C is showing up as the note with the greatest volume). The harmonics will soon make it hard to see more than just a few notes…

Your "solution" most likely makes matching even more difficult, since you will have no idea what waveform to make for each note. Most musical instruments and voices not only produce waveforms that are significantly different from single sinewaves or any other familiar waveform, but these waveforms evolve over time. Thus guessing the proper
waveform to use for each note for a match is extremely improbable.

Related

SuperpoweredSDK Frequencies Example

I'm building an iOS app using the SuperpoweredFrequencies project as an example. Everything is working great. I've increased the number of bands to 55 and experimented with widths of 1/12 and 1/24 to tighten up the filtering range surrounding the individual frequencies in question.
I've noticed something when testing with a musical instrument, that when I play lower notes, starting approximately with A 110 that the amplitudes of these frequencies are registering much lower than when playing higher notes, say A 220 and A 440. This makes detecting the fundamental frequencies more difficult when lower notes are being played as it often appears as if I am playing the note an octave higher (the harmonic frequencies show up more prominently than the fundamental frequency for lower notes).
Can someone shed some light on this phenomenon? This doesn't appear to be due to the iPhone's mic because the same thing happens when testing on both my iMac and Mac Book. Is there a way of dealing with this issue using Superpowered's api so that the fundamental frequency can be detected when lower notes are being played?
Correction: I was testing a little more this morning with a guitar, and what I noticed is that for the E (82.4069) and F (87.3071) the fundamental frequencies (82.xxx and 87.xxx) register less prominently than the perfect fifth above those frequencies, B and B# respectively.
Maybe it is just due to the nature of the guitar as an instrument. Unfortunately I don't have a piano to test with. How do the responses look when playing the low notes on a piano?
The sensitivity of the iPhone's microphone may be lower in that region: https://blog.faberacoustical.com/2009/ios/iphone/iphone-microphone-frequency-response-comparison/
That's why harmonics may be picked up at a higher volume.

OpenCV: Effective way to automatically set NoiseSigma in BackgroundSubtractorMOG

I'm trying to make a program that detects people in CCTV footage and I've made a lot of progress. Unfortunately, the amount of noise in the videos varies a lot between different cameras and time of day, so with each of the sample videos. This means that the NoiseSigma needed varies from 1-25.
I've used a fastNlMeansDenoisingColored function and that helped a bit, but the NoiseSigma is still an issue.
Would it be effective to maybe loop through the video once, and somehow get an idea for how noisy the video is and make a relationship for noise vs NoiseSigma? Any ideas would be welcome.
I don't think it's possible to determine noise level in an image (or video) without having a reference data which doesn't contain any noise. One thing that comes to my mind is to record some static scenery and measure how all the frames differ between each other and then try to find some relationship (hopefuly linear) between the measure and NoiseSigma. If there was no noise, the accumulated difference between frames would be 0. By accumulated difference I mean something like this:
for i=1, i<frames.count(), ++i
{
cumulativeError += sum(abs(frame(i) - frame(i-1)))
}
cumulativeError/=frames.count()
Where sum adds up all elements of an image (frame) to produce scalar value.
Please keep in mind that I'm just following my intuition here and it's not a method I've seen before.

Tracking of a manually marked point in the next frames of a video

I have to track a point that is being manually marked in the first frame. The location of this marked point is to be located in the next consecutive frames of the video. The video is grayscale (fluorescent video)
Which algorithm or technique should I apply?
I would say KLT (Kanade-Lucas-Tomasi) is your best bet. See this example in MATLAB.
Depends.
What kind of apparent motion are you trying to track? Is it mostly a pure translation, or do you expect significant rotation/streching/scaling frame to trame (e.g. are you trying to track a spoke in the wheel of a car?)
If you can live with pure translation, KLT, as Dima suggested, is one worth trying. However, if the size of the used pattern and the inter-frame displacements are largish (meaning you may have to search in largish windows), FFT-based normalized correlation may be a winner (see J.P. Lewis's classic paper for details).
If you need a more complex motion model, I'd first try the affine extension of the KLT. I personally did use that one to track spokes of car wheels.

How does knocktounlock work?

I am trying to figure out how knocktounlock.com is able to detect "knocks" on the iPhone. I am sure they use the accelerometer to achieve this, however all my tries come up with false flags (if user moves, jumps, etc it sometimes fires)
Basically, I want to be able to detect when a user knocks/taps/smacks their phone (and be able to distinguish that from things that may also give a rise to the accelerometer). So I am looking for sharp high peeks. The device will be in the pocket so the movement of the device will not be very much.
I have tried things like high/low pass (not sure if there would be a better option)
This is a duplicate of this: Detect hard taps anywhere on iPhone through accelerometer But it has not received any answers.
Any help/suggestions would be awesome! Thanks.
EDIT: Looking for more thoughts before I accept the answer below. I did hear back from Knocktounlock and they use the fourth derivative (jounce) to get better values to then analyse. Which is interesting.
I would consider knock on the iPhone to be exactly same as bumping two phones with each other. Check out this Github Repo,
https://github.com/joejcon1/iOS-Accelerometer-visualiser
Build&Run the App on iPhone and check out the spikes on Green line. You can see the value of the spike clearly,
Knocking the iPhone:
As you can see the time of the actual spike is very short when you knock the phone. However the spike patterns are little different in Hard Knock and Soft knock but can be distinguished programmatically.
Now lets see the Accelerometer pattern when iPhone moves in space freely,
As you can see the Spikes are bell shaped that means the it takes a little time for spike value to return to 0.
By these pattern it will be easier to determine the knocking pattern. Good Luck.
Also, This will drain your battery out as the sensor will always be running and iPhone needs to persist connection with Mac via Bluetooth.
P.S.: Also check this answer, https://stackoverflow.com/a/7580185/753603
I think the way to go here is using pattern recognition with accelerometer data.
You could (write and) train a classifier (e.g. K-nearest neighbor) with data you gathered and that has been classified by hand. Neural networks are also an option. However, there will be many different ways to solve that problem. But there is probably no straightforward way for achieving this.
Some papers showing pattern recognition approaches to similar topics (activity, movement), like
http://www.math.unipd.it/~cpalazzi/papers/Palazzi-Accelerometer.pdf
(some more, but I am not allowed to post them with my reputation count. You can search for "pattern recognition accelerometer data")
There is also a master thesis about gesture recognition on the iPhone:
http://klingmann.ch/msc_thesis_marco_klingmann_iphone_gestures.pdf
In general you won't achieve 100% correct classification. Depending on the time/knowledge one has got the result will vary between good-usable and we-could-use-random-classification-instead.
Just a though, but It could be useful to add to the mix the output of the microphone to listen to really short, loud noises at the same time that a possible "knock" movement has been detected.
I am surprised that 4th derivative is needed, intuitively feels to me 3rd ("jerk", the derivative of acceleration) should be enough. It is a big hint what to keep eye on, though.
It seems quite simple to me: collect accelerometer data at high rates, plot on chart, observe. Calculate from that first derivative, plot&observe. Then rinse&repeat, derivative of the last one. Draw conclusions. I highly doubt you will need to do pattern recognition per se, clustering/classifiers/what-have-you - i think you will see very distinct peak on one of your charts, may only need to tune collection rate and smoothing.
It is more interesting to me how come you don't have to be running the KnockToUnlock app for this to work? And if it was running in the background, who left it run there for unlimited time. I dont think accel. qualifies for unlimited background run. And after some pondering, i am guessing the reason is that the app uses Bluetooth to connect Mac as accessory - and as such gets a pass from iOS to run in the background (and suck your battery, shhht)
To solve this problem you need to select the frequency. Tap (knock)
has a very high frequency, so you should chose the frequency of the
accelerometer is not lower than 50 Hz (perhaps even 100 Hz) for
quality tap detection in the case of noise from other movements.
The use of classifiers is necessary, but in order to save battery consumption you should not call a classifier very often.It should write a simple algorithm that would find only taps and situation similar to knoks and report that you program need to call a classifier.
Note the gyro signal, it also responds to knocks, besides the
gyroscope signal not be need separated from the constant component
and the gyroscope signal contains less noise.
That is a good video about the basics of working with smartphones sensors: http://talkminer.com/viewtalk.jsp?videoid=C7JQ7Rpwn2k#.UaTTYUC-2Sp .

Identify a specific sound on iOS

I'd like to be able to recognise a specific sound in an iOS application. I guess it would basically work like speech recognition in that it's fairly fuzzy, but it would only have to be for 1 specific sound.
I've done some quick FFT stuff to identify specific frequencies over a certain threshold and only when they're solo (ie, they're not surrounded by other frequencies) so I can identify individual tones pretty easily. I'm thinking it's just an extension of this, but comparing to an FFT data set of a recording of the sound, and compare say 0.1 second chunks over the length of the audio. And I would also have to account for variation in amplitude, a little in pitch and a little in time.
Can anyone point me to any pre-existing source that I could use to speed this process along? I can't seem to find anything usable. Or failing that, any ideas on how to get started on something like this?
Thanks very much
From your description it is not entirely clear what you want to do.
What is the "specific" sound like? Does it have high background noise?
Whats the specific recognizable feature (e.g. pitch, inhamonicity, timbre ...)?
Against which other "sounds" do you want to compare it?
Do you simply want to match an arbitrary sound spectrum against a "template sound"?
Is your sound percussive, melodic, speech, ...? Is it long, short ...?
Whats the frequency range you expect the best discriminability? Are the features invariant with time?
There is no "general" solution that works for everything. Speech recognition in itself is fairly complex and wont work well for abstract sounds whose discriminable frequencies are not in the e.g. MEL bands.
So in conclusion, you are leaving too many open questions to get a useful answer.
Only suggestion i can make based on the few informations is the following:
For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification
For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound
Note: This is a very crude method which discriminates sounds according to their most powerful frequencies. Using the gaussians it leaves room for slight shifts in the most powerful frequencies.

Resources