On extracting the sound pressure level from AVAudioPCMBuffer - ios

I have almost no knowledge in signal-processing and currently I'm trying to implement a function in Swift that triggers an event when there is an increase in the sound pressure level (e.g. when a human screams).
I am tapping into an input node of an AVAudioEngine with a callback like this:
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat){
(buffer : AVAudioPCMBuffer?, when : AVAudioTime) in
let arraySize = Int(buffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))
//do something with samples
let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize))
if(!volume.isNaN){
print("this is the current volume: \(volume)")
}
}
After turning it into a float array I tried just getting a rough estimation of the sound pressure level by computing the mean.
But this gives me values that fluctuate a lot even when the iPad was just sitting in a quite room:
this is the current volume: -123.971
this is the current volume: -119.698
this is the current volume: -147.053
this is the current volume: -119.749
this is the current volume: -118.815
this is the current volume: -123.26
this is the current volume: -118.953
this is the current volume: -117.273
this is the current volume: -116.869
this is the current volume: -110.633
this is the current volume: -130.988
this is the current volume: -119.475
this is the current volume: -116.422
this is the current volume: -158.268
this is the current volume: -118.933
There is indeed an significant increase in this value if I clap near the microphone.
So I can do something like first computing a mean of these volumes during the preparing phase, and comparing if there is a significant increase in the difference during the event-triggering phase:
if(!volume.isNaN){
if(isInThePreparingPhase){
print("this is the current volume: \(volume)")
volumeSum += volume
volumeCount += 1
}else if(isInTheEventTriggeringPhase){
if(volume > meanVolume){
//triggers an event
}
}
}
where averageVolume is computed during the transition from the preparing phase to the triggering event phase: meanVolume = volumeSum / Float(volumeCount)
....
However, there appears to be no significant increases if I play loud music besides the microphone. And on rare occasion, volume is greater than meanVolume even when the environment has no significant increase in volume (audible to the human ears).
So what is the proper way of extracting the sound pressure level from AVAudioPCMBuffer?
The wikipedia gives a formula like this
with p being the root mean square sound pressure and p0 being the reference sound pressure.
But I have no ideas what the float values in AVAudioPCMBuffer.floatChannelData represent. The apple page only says
The buffer's audio samples as floating point values.
How should I work with them?

Thanks to the response from #teadrinker I finally find out a solution for this problem. I share my Swift code that outputs the volume of the AVAudioPCMBuffer input:
private func getVolume(from buffer: AVAudioPCMBuffer, bufferSize: Int) -> Float {
guard let channelData = buffer.floatChannelData?[0] else {
return 0
}
let channelDataArray = Array(UnsafeBufferPointer(start:channelData, count: bufferSize))
var outEnvelope = [Float]()
var envelopeState:Float = 0
let envConstantAtk:Float = 0.16
let envConstantDec:Float = 0.003
for sample in channelDataArray {
let rectified = abs(sample)
if envelopeState < rectified {
envelopeState += envConstantAtk * (rectified - envelopeState)
} else {
envelopeState += envConstantDec * (rectified - envelopeState)
}
outEnvelope.append(envelopeState)
}
// 0.007 is the low pass filter to prevent
// getting the noise entering from the microphone
if let maxVolume = outEnvelope.max(),
maxVolume > Float(0.015) {
return maxVolume
} else {
return 0.0
}
}

I think the first step is to get the envelope of the sound. You could use simple averaging to calculate an envelope, but you need to add a rectification step (usually means using abs() or square() to make all samples positive)
More commonly a simple iir-filter is used instead of averaging, with different constants for attack and decay, here is a lab. Note that these constants depend on the sampling frequency, you can use this formula to calculate the constants:
1 - exp(-timePerSample*2/smoothingTime)
Step 2
When you have the envelope, you can smooth it with an additional filter, and then compare the two envelopes to find a sound that is louder than the baselevel, here's a more complete lab.
Note that detecting audio "events" can be quite tricky, and hard to predict, make sure you have a lot of debbugging aid!

Related

Detect when a chopping motion has been made - Swift

I'm starting with my first app for iOS and I am trying to get gyro data to play a whip sound when you flick your phone.
From what I can tell, I should be using CoreMotion to get the state of the gyro, then doing some math to work out when a whip-like gesture is made, and then to run my function?
This is what I have so far - this is my ContentView.swift file.
import SwiftUI
import AVFoundation
import CoreMotion
let popSound = Bundle.main.url(forResource: "whip", withExtension: "mp3")
var audioPlayer = AVAudioPlayer()
var motionManager: CMMotionManager!
func audioPlayback() {
do {
audioPlayer = try AVAudioPlayer(contentsOf: popSound!)
audioPlayer.play()
} catch {
print("couldn't load sound file")
}
}
struct ContentView: View {
var body: some View {
Text("Press button!")
Button(action: {
audioPlayback()
}, label: {
Text("Press me!")
})
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
Currently it's set to a button. Can someone link me to a resource, or walk me though this?
Usually when dealing with such devices you can either get a current snapshot or you can request to get a feed of snapshot changes. In your case the promising methods seem to be startAccelerometerUpdates and startDeviceMotionUpdates for CMMotionManager. I am pretty sure somewhere in there should be sufficient information to nearly detect a gesture you are describing.
If you dig into these methods you will see that you get a feed of "frames" where each frame describes a situation at certain time.
Since you are detecting a gesture you are not interested into a single frame but rather a series of frames. So probably the first thing you need is some object to which you can append frames and this object will evaluate if current set of frames corresponds to your gesture or not. It should also be able to discard data which is not interesting. For instance frames older than 3 seconds can be discarded as this gesture should never need more than 3 seconds.
So now your problem is split into 2 parts. First part is creating an object that is able to collect frames. I would give it a public method like appendFrame or appendSnapshot. Then keep collecting frames on it. The object also needs to be able to report back that it has detected a required gesture so that you play a sound at that point. Without the detection you should be able to mock for instance that after 100 frames the buffer is cleared and that notification is reported back which then triggers the sound. So no detection at this point but everything else.
The second part is the detection itself. You now have a pool of samples, frames or snapshots. You can at any time aggregate data anyway you want to. You would probably use a secondary thread to process the data so the UI is not laggy and to be able to throttle how much CPU power you put into it. As for the detection itself I would say you may try to create some samples and try figure out the "math" part. When you have some idea or can at least preset the community with some recordings you could ask another specific question about that. It does look like a textbook example to use Machine Learning for instance.
From mathematical point of view there may be some shortcuts. A very simple example would be just looking at the direction of your device as normalized direction(x, y, z). I think you can actually already get that very easily from native components. In a "chopping" motion we expect that rotation suddenly (nearly) stopped and was recently (nearly) 90 degrees offset from current direction.
Speed:
Assuming you have an array of direction such as let directions[(x: CGFloat, y: CGFloat, z: CGFloat)] then you could identify some rotation speed changes with length of cross product.
let rotationSpeed = length(cross(directions[index], directions[index+1]))
the speed should always be a value between 0 and 1 where a maximum of 1 would mean 90 degrees change. Hope it never comes to that and you are always in values between 0 and 0.3. If you DO get to values larger than 0.5 then frame-rate of your device is too low and samples are best just discarded.
Using this approach you can map your rotations from array of vectors to array of speeds rotationSpeeds: [Float] which becomes more convenient for you. You are now looking within this array if there is a part where the rotation speed suddenly drops from high value to low value. What those values are you will need to test yourself and tweak them. But a "sudden drop" may not be on only 2 sequential samples. You need to find for instance 5 high speed frames followed by 2 low speed frames. Rather even more than that.
Now that you found such a point you found a candidate for end of your chop. At this point you can now go backwards and check all frames going back in time up to somewhere between 0.5 and 1.0 seconds from candidate (again a value you will need to try out yourself). If any of this frame is nearly 90 degrees away from candidate then you have your gesture. Something like the following should do:
length(cross(directions[index], directions[candidateIndex])) > 0.5
where the 0.5 is again something you will need to test. The closer to 1.0 the more precise the gesture needs to be. I think 0.5 should be pretty good to begin with.
Perhaps you can play with the following and see if you can get satisfying results:
struct Direction {
let x: Float
let y: Float
let z: Float
static func cross(_ a: Direction, _ b: Direction) -> Direction {
Direction(x: a.y*b.z - a.z*b.y, y: a.z*b.x - a.x*b.z, z: a.x*b.y - a.y*b.z) // Needs testing
}
var length: Float { (x*x + y*y + z*z).squareRoot() }
}
class Recording<Type> {
private(set) var samples: [Type] = [Type]()
func appendSample(_ sample: Type) { samples.append(sample) }
}
class DirectionRecording: Recording<Direction> {
func convertToSpeedRecording() -> SpeedRecording {
let recording = SpeedRecording()
if samples.count > 1 { // Need at least 2 samples
for index in 0..<samples.count-1 {
recording.appendSample(Direction.cross(samples[index], samples[index+1]).length)
}
}
return recording
}
}
class SpeedRecording: Recording<Float> {
func detectSuddenDrops(minimumFastSampleCount: Int = 4, minimumSlowSampleCount: Int = 2, maximumThresholdSampleCount: Int = 2, minimumSpeedTreatedAsHigh: Float = 0.1, maximumSpeedThresholdTreatedAsLow: Float = 0.05) -> [Int] { // Returns an array of indices where sudden drop occurred
var result: [Int] = [Int]()
// Using states to identify where in the sequence we currently are.
// The state should go none -> highSpeed -> lowSpeed
// Or the state should go none -> highSpeed -> thresholdSpeed -> lowSpeed
enum State {
case none
case highSpeed(sequenceLength: Int)
case thresholdSpeed(sequenceLength: Int)
case lowSpeed(sequenceLength: Int)
}
var currentState: State = .none
samples.enumerated().forEach { index, sample in
if sample > minimumSpeedTreatedAsHigh {
// Found a high speed sample
switch currentState {
case .none: currentState = .highSpeed(sequenceLength: 1) // Found a first high speed sample
case .lowSpeed: currentState = .highSpeed(sequenceLength: 1) // From low speed to high speed resets it back to high speed step
case .thresholdSpeed: currentState = .highSpeed(sequenceLength: 1) // From threshold speed to high speed resets it back to high speed step
case .highSpeed(let sequenceLength): currentState = .highSpeed(sequenceLength: sequenceLength+1) // Append another high speed sample
}
} else if sample > maximumSpeedThresholdTreatedAsLow {
// Found a sample somewhere between fast and slow
switch currentState {
case .none: break // Needs to go to high speed first
case .lowSpeed: currentState = .none // Low speed back to threshold resets to beginning
case .thresholdSpeed(let sequenceLength):
if sequenceLength < maximumThresholdSampleCount { currentState = .thresholdSpeed(sequenceLength: sequenceLength+1) } // Can still stay inside threshold
else { currentState = .none } // In threshold for too long. Reseting back to start
case .highSpeed: currentState = .thresholdSpeed(sequenceLength: 1) // A first transition from high speed to threshold
}
} else {
// A low speed sample found
switch currentState {
case .none: break // Waiting for high speed sample sequence
case .lowSpeed(let sequenceLength):
if sequenceLength < minimumSlowSampleCount { currentState = .lowSpeed(sequenceLength: sequenceLength+1) } // Not enough low speed samples yet
else { result.append(index); currentState = .none } // Got everything we need. This is a HIT
case .thresholdSpeed: currentState = .lowSpeed(sequenceLength: 1) // Threshold can always go to low speed
case .highSpeed: currentState = .lowSpeed(sequenceLength: 1) // High speed can always go to low speed
}
}
}
return result
}
}
func recordingContainsAChoppingGesture(recording: DirectionRecording, minimumAngleOffset: Float = 0.5, maximumSampleCount: Int = 50) -> Bool {
let speedRecording = recording.convertToSpeedRecording()
return speedRecording.detectSuddenDrops().contains { index in
for offset in 1..<maximumSampleCount {
let sampleIndex = index-offset
guard sampleIndex >= 0 else { return false } // Can not go back any further than that
if Direction.cross(recording.samples[index], recording.samples[sampleIndex]).length > minimumAngleOffset {
return true // Got it
}
}
return false // Sample count drained
}
}

Using AKMixer with volume lower 0.00001 there is no output

We are using two AKMixer (one for left, one for right channel) and one AKMixer as output with these two mixers as inputs.
If one of the mixers has a volume lower than 0.00001 the output signal is lost. But lower volumes are possible, because if we lower the main system volume on values over 0.00001 the signal on the headphone-jack is going lower.
As a workaround I tried to set the AKMixer.output.volume to 0.5 and the input mixers to 0.00001 and it works too. But in my application I also need max output and than I got weird "clicks" when changing the both volume levels at once.
It would be great if somebody can help. With the workaround or the causing problem.
Thanks.
var rightSine = AKOscillator(waveform: AKTable(.sine))
var rightPanner : AKMixer!
let pan2 = AKPanner(self.rightSine, pan: 1)
pan2.rampDuration = 0
let right1: AKMixer = AKMixer(pan2 /*, .... some more */)
self.rightPanner = right1
let mix = AKMixer(self.rightPanner /* left channel... */)
mix.volume = 1.0
AudioKit.output = mix
do {
try AudioKit.start()
} catch {
}
self.rightPanner.volume = 0.00002
This is the code used to initialise the audio stuff (shortened) and afterwards the nodes are started.
*Edit: I'm testing the precise threshold on which the output is broken..
AudioKit's AKMixer is a simple wrapper around Apple's AVAudioMixerNode and as such, I can't really dig much deeper to help you solve the problem using that node. But, if you're willing to switch to AKBooster, whose job it is to amplify or diminish a signal, I think you will be fine to use small numbers for your gain value.
var rightSine = AKOscillator(waveform: AKTable(.sine))
var rightBooster: AKBooster!
let pan2 = AKPanner(self.rightSine, pan: 1)
pan2.rampDuration = 0
let right1: AKMixer = AKMixer(pan2 /*, .... some more */)
self. rightBooster = AKBooster(right1)
let mix = AKMixer(self. rightBooster /* left channel... */)
mix.volume = 1.0
AudioKit.output = mix
self.rightBooster.gain = 0.00002

How do I achieve very accurate timing in Swift?

I am working on a musical app with an arpeggio/sequencing feature that requires great timing accuracy. Currently, using `Timer' I have achieved an accuracy with an average jitter of ~5ms, but a max jitter of ~11ms, which is unacceptable for fast arpeggios of 8th, 16th notes & 32nd notes especially.
I've read the 'CADisplayLink' is more accurate than 'Timer', but since it is limited to 1/60th of a second for it's accuracy (~16-17ms), it seems like it would be a less accurate approach than what I've achieved with Timer.
Would diving into CoreAudio be the only way to achieve what I want? Is there some other way to achieve more accurate timing?
I did some testing of Timer and DispatchSourceTimer (aka GCD timer) on iPhone 7 with 1000 data points with an interval of 0.05 seconds. I was expecting GCD timer to be appreciably more accurate (given that it had a dedicated queue), but I found that they were comparable, with standard deviation of my various trials ranging from 0.2-0.8 milliseconds and maximum deviation from the mean of about 2-8 milliseconds.
When trying mach_wait_until as outlined in Technical Note TN2169: High Precision Timers in iOS / OS X, I achieved a timer that was roughly 4 times as accurate than what I achieved with either Timer or GCD timers.
Having said that, I'm not entirely confident of the mach_wait_until is the best approach, as the determination of the specific policy values for thread_policy_set seem to be poorly documented. But the code below reflects the values I used in my tests, using code adapted from How to set realtime thread in Swift? and TN2169:
var timebaseInfo = mach_timebase_info_data_t()
func configureThread() {
mach_timebase_info(&timebaseInfo)
let clock2abs = Double(timebaseInfo.denom) / Double(timebaseInfo.numer) * Double(NSEC_PER_SEC)
let period = UInt32(0.00 * clock2abs)
let computation = UInt32(0.03 * clock2abs) // 30 ms of work
let constraint = UInt32(0.05 * clock2abs)
let THREAD_TIME_CONSTRAINT_POLICY_COUNT = mach_msg_type_number_t(MemoryLayout<thread_time_constraint_policy>.size / MemoryLayout<integer_t>.size)
var policy = thread_time_constraint_policy()
var ret: Int32
let thread: thread_port_t = pthread_mach_thread_np(pthread_self())
policy.period = period
policy.computation = computation
policy.constraint = constraint
policy.preemptible = 0
ret = withUnsafeMutablePointer(to: &policy) {
$0.withMemoryRebound(to: integer_t.self, capacity: Int(THREAD_TIME_CONSTRAINT_POLICY_COUNT)) {
thread_policy_set(thread, UInt32(THREAD_TIME_CONSTRAINT_POLICY), $0, THREAD_TIME_CONSTRAINT_POLICY_COUNT)
}
}
if ret != KERN_SUCCESS {
mach_error("thread_policy_set:", ret)
exit(1)
}
}
I then could do:
private func nanosToAbs(_ nanos: UInt64) -> UInt64 {
return nanos * UInt64(timebaseInfo.denom) / UInt64(timebaseInfo.numer)
}
private func startMachTimer() {
Thread.detachNewThread {
autoreleasepool {
self.configureThread()
var when = mach_absolute_time()
for _ in 0 ..< maxCount {
when += self.nanosToAbs(UInt64(0.05 * Double(NSEC_PER_SEC)))
mach_wait_until(when)
// do something
}
}
}
}
Note, you might want to see if when hasn't already passed (you want to make sure that your timers don't get backlogged if your processing can't be completed in the allotted time), but hopefully this illustrates the idea.
Anyway, with mach_wait_until, I achieved greater fidelity than Timer or GCD timers, at the cost of CPU/power consumption as described in What are the do's and dont's of code running with high precision timers?
I appreciate your skepticism on this final point, but I suspect it would be prudent to dive into CoreAudio and see if it might offer a more robust solution.
For acceptable musically accurate rhythms, the only suitable timing source is using Core Audio or AVFoundation.
I'm working on a sequencer App myself, and I would defiantly recommend using AudioKit for those purposes.
It has a its own sequencer class.
https://audiokit.io/

Spectrogram from AVAudioPCMBuffer using Accelerate framework in Swift

I'm trying to generate a spectrogram from an AVAudioPCMBuffer in Swift. I install a tap on an AVAudioMixerNode and receive a callback with the audio buffer. I'd like to convert the signal in the buffer to a [Float:Float] dictionary where the key represents the frequency and the value represents the magnitude of the audio on the corresponding frequency.
I tried using Apple's Accelerate framework but the results I get seem dubious. I'm sure it's just in the way I'm converting the signal.
I looked at this blog post amongst other things for a reference.
Here is what I have:
self.audioEngine.mainMixerNode.installTapOnBus(0, bufferSize: 1024, format: nil, block: { buffer, when in
let bufferSize: Int = Int(buffer.frameLength)
// Set up the transform
let log2n = UInt(round(log2(Double(bufferSize))))
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
// Create the complex split value to hold the output of the transform
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
// Now I need to convert the signal from the buffer to complex value, this is what I'm struggling to grasp.
// The complexValue should be UnsafePointer<DSPComplex>. How do I generate it from the buffer's floatChannelData?
vDSP_ctoz(complexValue, 2, &output, 1, UInt(bufferSize / 2))
// Do the fast Fournier forward transform
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
// Convert the complex output to magnitude
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
vDSP_zvmags(&output, 1, &fft, 1, vDSP_length(bufferSize / 2))
// Release the setup
vDSP_destroy_fftsetup(fftsetup)
// TODO: Convert fft to [Float:Float] dictionary of frequency vs magnitude. How?
})
My questions are
How do I convert the buffer.floatChannelData to UnsafePointer<DSPComplex> to pass to the vDSP_ctoz function? Is there a different/better way to do it maybe even bypassing vDSP_ctoz?
Is this different if the buffer contains audio from multiple channels? How is it different when the buffer audio channel data is or isn't interleaved?
How do I convert the indices in the fft array to frequencies in Hz?
Anything else I may be doing wrong?
Update
Thanks everyone for suggestions. I ended up filling the complex array as suggested in the accepted answer. When I plot the values and play a 440 Hz tone on a tuning fork it registers exactly where it should.
Here is the code to fill the array:
var channelSamples: [[DSPComplex]] = []
for var i=0; i<channelCount; ++i {
channelSamples.append([])
let firstSample = buffer.format.interleaved ? i : i*bufferSize
for var j=firstSample; j<bufferSize; j+=buffer.stride*2 {
channelSamples[i].append(DSPComplex(real: buffer.floatChannelData.memory[j], imag: buffer.floatChannelData.memory[j+buffer.stride]))
}
}
The channelSamples array then holds separate array of samples for each channel.
To calculate the magnitude I used this:
var spectrum = [Float]()
for var i=0; i<bufferSize/2; ++i {
let imag = out.imagp[i]
let real = out.realp[i]
let magnitude = sqrt(pow(real,2)+pow(imag,2))
spectrum.append(magnitude)
}
Hacky way: you can just cast a float array. Where reals and imag values are going one after another.
It depends on if audio is interleaved or not. If it's interleaved (most of the cases) left and right channels are in the array with STRIDE 2
Lowest frequency in your case is frequency of a period of 1024 samples. In case of 44100kHz it's ~23ms, lowest frequency of the spectrum will be 1/(1024/44100) (~43Hz). Next frequency will be twice of this (~86Hz) and so on.
4: You have installed a callback handler on an audio bus. This is likely run with real-time thread priority and frequently. You should not do anything that has potential for blocking (it will likely result in priority inversion and glitchy audio):
Allocate memory (realp, imagp - [Float](.....) is shorthand for Array[float] - and likely allocated on the heap`. Pre-allocate these
Call lengthy operations such as vDSP_create_fftsetup() - which also allocates memory and initialises it. Again, you can allocate this once outside of your function.

Detecting when someone begins walking using Core Motion and CMAccelerometer Data

I'm trying to detect three actions: when a user begins walking, jogging, or running. I then want to know when the stop. I've been successful in detecting when someone is walking, jogging, or running with the following code:
- (void)update:(CMAccelerometerData *)accelData {
[(id) self setAcceleration:accelData.acceleration];
NSTimeInterval secondsSinceLastUpdate = -([self.lastUpdateTime timeIntervalSinceNow]);
if (labs(_acceleration.x) >= 0.10000) {
NSLog(#"walking: %f",_acceleration.x);
}
else if (labs(_acceleration.x) > 2.0) {
NSLog(#"jogging: %f",_acceleration.x);
}
else if (labs(_acceleration.x) > 4.0) {
NSLog(#"sprinting: %f",_acceleration.x);
}
The problem I run into is two-fold:
1) update is called multiple times every time there's a motion, probably because it checks so frequently that when the user begins walking (i.e. _acceleration.x >= .1000) it is still >= .1000 when it calls update again.
Example Log:
2014-02-22 12:14:20.728 myApp[5039:60b] walking: 1.029846
2014-02-22 12:14:20.748 myApp[5039:60b] walking: 1.071777
2014-02-22 12:14:20.768 myApp[5039:60b] walking: 1.067749
2) I'm having difficulty figuring out how to detect when the user stopped. Does anybody have advice on how to implement "Stop Detection"
According to your logs, accelerometerUpdateInterval is about 0.02. Updates could be less frequent if you change mentioned property of CMMotionManager.
Checking only x-acceleration isn't very accurate. I can put a device on a table in a such way (let's say on left edge) that x-acceleration will be equal to 1, or tilt it a bit. This will cause a program to be in walking mode (x > 0.1) instead of idle.
Here's a link to ADVANCED PEDOMETER FOR SMARTPHONE-BASED ACTIVITY TRACKING publication. They track changes in the direction of the vector of acceleration. This is the cosine of the angle between two consecutive acceleration vector readings.
Obviously, without any motion, angle between two vectors is close to zero and cos(0) = 1. During other activities d < 1. To filter out noise, they use a weighted moving average of the last 10 values of d.
After implementing this, your values will look like this (red - walking, blue - running):
Now you can set a threshold for each activity to separate them. Note that average step frequency is 2-4Hz. You should expect current value to be over the threshold at least few times in a second in order to identify the action.
Another helpful publications:
ERSP: An Energy-efficient Real-time Smartphone Pedometer (analyze peaks and throughs)
A Gyroscopic Data based Pedometer Algorithm (threshold detection of gyro readings)
UPDATE
_acceleration.x, _accelaration.y, _acceleration.z are coordinates of the same acceleration vector. You use each of these coordinates in d formula. In order to calculate d you also need to store acceleration vector of previous update (with i-1 index in formula).
WMA just take into account 10 last d values with different weights. Most recent d values have more weight, therefore, more impact on resulting value. You need to store 9 previous d values in order to calculate current one. You should compare WMA value to corresponding threshold.
if you are using iOS7 and iPhone5S, I suggest you look into CMMotionActivityManager which is available in iPhone5S because of the M7 chip. It is also available in a couple of other devices:
M7 chip
Here is a code snippet I put together to test when I was learning about it.
#import <CoreMotion/CoreMotion.h>
#property (nonatomic,strong) CMMotionActivityManager *motionActivityManager;
-(void) inSomeMethod
{
self.motionActivityManager=[[CMMotionActivityManager alloc]init];
//register for Coremotion notifications
[self.motionActivityManager startActivityUpdatesToQueue:[NSOperationQueue mainQueue] withHandler:^(CMMotionActivity *activity)
{
NSLog(#"Got a core motion update");
NSLog(#"Current activity date is %f",activity.timestamp);
NSLog(#"Current activity confidence from a scale of 0 to 2 - 2 being best- is: %ld",activity.confidence);
NSLog(#"Current activity type is unknown: %i",activity.unknown);
NSLog(#"Current activity type is stationary: %i",activity.stationary);
NSLog(#"Current activity type is walking: %i",activity.walking);
NSLog(#"Current activity type is running: %i",activity.running);
NSLog(#"Current activity type is automotive: %i",activity.automotive);
}];
}
I tested it and it seems to be pretty accurate. The only drawback is that it will not give you a confirmation as soon as you start an action (walking for example). Some black box algorithm waits to ensure that you are really walking or running. But then you know you have a confirmed action.
This beats messing around with the accelerometer. Apple took care of that detail!
You can use this simple library to detect if user is walking, running, on vehicle or not moving. Works on all iOS devices and no need M7 chip.
https://github.com/SocialObjects-Software/SOMotionDetector
In repo you can find demo project
I'm following this paper(PDF via RG) in my indoor navigation project to determine user dynamics(static, slow walking, fast walking) via merely accelerometer data in order to assist location determination.
Here is the algorithm proposed in the project:
And here is my implementation in Swift 2.0:
import CoreMotion
let motionManager = CMMotionManager()
motionManager.accelerometerUpdateInterval = 0.1
motionManager.startAccelerometerUpdatesToQueue(NSOperationQueue.mainQueue()) { (accelerometerData: CMAccelerometerData?, error: NSError?) -> Void in
if((error) != nil) {
print(error)
} else {
self.estimatePedestrianStatus((accelerometerData?.acceleration)!)
}
}
After all of the classic Swifty iOS code to initiate CoreMotion, here is the method crunching the numbers and determining the state:
func estimatePedestrianStatus(acceleration: CMAcceleration) {
// Obtain the Euclidian Norm of the accelerometer data
accelerometerDataInEuclidianNorm = sqrt((acceleration.x.roundTo(roundingPrecision) * acceleration.x.roundTo(roundingPrecision)) + (acceleration.y.roundTo(roundingPrecision) * acceleration.y.roundTo(roundingPrecision)) + (acceleration.z.roundTo(roundingPrecision) * acceleration.z.roundTo(roundingPrecision)))
// Significant figure setting
accelerometerDataInEuclidianNorm = accelerometerDataInEuclidianNorm.roundTo(roundingPrecision)
// record 10 values
// meaning values in a second
// accUpdateInterval(0.1s) * 10 = 1s
while accelerometerDataCount < 1 {
accelerometerDataCount += 0.1
accelerometerDataInASecond.append(accelerometerDataInEuclidianNorm)
totalAcceleration += accelerometerDataInEuclidianNorm
break // required since we want to obtain data every acc cycle
}
// when acc values recorded
// interpret them
if accelerometerDataCount >= 1 {
accelerometerDataCount = 0 // reset for the next round
// Calculating the variance of the Euclidian Norm of the accelerometer data
let accelerationMean = (totalAcceleration / 10).roundTo(roundingPrecision)
var total: Double = 0.0
for data in accelerometerDataInASecond {
total += ((data-accelerationMean) * (data-accelerationMean)).roundTo(roundingPrecision)
}
total = total.roundTo(roundingPrecision)
let result = (total / 10).roundTo(roundingPrecision)
print("Result: \(result)")
if (result < staticThreshold) {
pedestrianStatus = "Static"
} else if ((staticThreshold < result) && (result <= slowWalkingThreshold)) {
pedestrianStatus = "Slow Walking"
} else if (slowWalkingThreshold < result) {
pedestrianStatus = "Fast Walking"
}
print("Pedestrian Status: \(pedestrianStatus)\n---\n\n")
// reset for the next round
accelerometerDataInASecond = []
totalAcceleration = 0.0
}
}
Also I've used the following extension to simplify significant figure setting:
extension Double {
func roundTo(precision: Int) -> Double {
let divisor = pow(10.0, Double(precision))
return round(self * divisor) / divisor
}
}
With raw values from CoreMotion, the algorithm was haywire.
Hope this helps someone.
EDIT (4/3/16)
I forgot to provide my roundingPrecision value. I defined it as 3. It's just plain mathematics that that much significant value is decent enough. If you like you provide more.
Also one more thing to mention is that at the moment, this algorithm requires the iPhone to be in your hand while walking. See the picture below. Sorry this was the only one I could find.
My GitHub Repo hosting Pedestrian Status
You can use Apple's latest Machine Learning framework CoreML to find out user activity. First you need to collect labeled data and train the classifier. Then you can use this model in your app to classify user activity. You may follow this series if are interested in CoreML Activity Classification.
https://medium.com/#tyler.hutcherson/activity-classification-with-create-ml-coreml3-and-skafos-part-1-8f130b5701f6

Resources