I am trying to create an application which runs a FFT on microphone data, so I can examine e.g. the loudest frequency in the input.
I see that there are many methods of getting audio input (the RemoteIO AudioUnit, AudioQueue services, and AVFoundation) but it seems like AVFoundation is the simplest. I have this setup:
// Configure the audio session
AVAudioSession *session = [AVAudioSession sharedInstance];
[session setCategory:AVAudioSessionCategoryRecord error:NULL];
[session setMode:AVAudioSessionModeMeasurement error:NULL];
[session setActive:YES error:NULL];
// Optional - default gives 1024 samples at 44.1kHz
//[session setPreferredIOBufferDuration:samplesPerSlice/session.sampleRate error:NULL];
// Configure the capture session (strongly-referenced instance variable, otherwise the capture stops after one slice)
_captureSession = [[AVCaptureSession alloc] init];
// Configure audio device input
AVCaptureDevice *device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
AVCaptureDeviceInput *input = [AVCaptureDeviceInput deviceInputWithDevice:device error:NULL];
[_captureSession addInput:input];
// Configure audio data output
AVCaptureAudioDataOutput *output = [[AVCaptureAudioDataOutput alloc] init];
dispatch_queue_t queue = dispatch_queue_create("My callback", DISPATCH_QUEUE_SERIAL);
[output setSampleBufferDelegate:self queue:queue];
[_captureSession addOutput:output];
// Start the capture session.
[_captureSession startRunning];
(plus error checking, omitted here for readability).
Then I implement the following AVCaptureAudioDataOutputSampleBufferDelegate method:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
NSLog(#"Num samples: %ld", CMSampleBufferGetNumSamples(sampleBuffer));
// Usually gives 1024 (except the first slice)
}
I'm unsure what the next step should be. What exactly does the CMSampleBuffer format describe (and what assumptions can be made about it, if any)? How should I get the raw audio data into vDSP_fft_zrip with the least possible amount of extra preprocessing? (Also, what would you recommend doing to verify that the raw data I see is correct?)
The CMSampleBufferRef is an opaque type that contains 0 or more media samples. There is a bit of blurb in the docs:
http://developer.apple.com/library/ios/#documentation/CoreMedia/Reference/CMSampleBuffer/Reference/reference.html
In this case it will contain an audio buffer, as well as the description of the sample format and timing information and so on. If you are really interested just put a breakpoint in the delegate callback and take a look.
The first step is to get a pointer to the data buffer that has been returned:
// get a pointer to the audio bytes
CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer);
CMBlockBufferRef audioBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
size_t lengthAtOffset;
size_t totalLength;
char *samples;
CMBlockBufferGetDataPointer(audioBuffer, 0, &lengthAtOffset, &totalLength, &samples);
The default sample format for the iPhone mic is linear PCM, with 16 bit samples. This may be mono or stereo depending on if there is an external mic or not. To calculate the FFT we need to have a float vector. Fortunately there is an accelerate function to do the conversion for us:
// check what sample format we have
// this should always be linear PCM
// but may have 1 or 2 channels
CMAudioFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
const AudioStreamBasicDescription *desc = CMAudioFormatDescriptionGetStreamBasicDescription(format);
assert(desc->mFormatID == kAudioFormatLinearPCM);
if (desc->mChannelsPerFrame == 1 && desc->mBitsPerChannel == 16) {
float *convertedSamples = malloc(numSamples * sizeof(float));
vDSP_vflt16((short *)samples, 1, convertedSamples, 1, numSamples);
} else {
// handle other cases as required
}
Now you have a float vector of the sample buffer which you can use with vDSP_fft_zrip. It doesn't seem possible to change the input format from the microphone to float samples with AVFoundation, so you are stuck with this last conversion step. I would keep around the buffers in practice, reallocing them if necessary when a larger buffer arrives, so that you are not mallocing and freeing buffers with every delegate callback.
As for your last question, I guess the easiest way to do this would be to inject a known input and check that it gives you the correct response. You could play a sine wave into the mic and check that your FFT had a peak in the correct frequency bin, something like that.
I don't suggest to use AVFoundation for 3 reasons:
I used it for some of mine apps (morsedec , irtty), it works well on simulator and in some hardware, but in others totally failed !
you do not have good control of sample rate an format.
latency could be high.
I suggest to start with apple's sample code aurioTouch.
To make FFT you can shift to vDSP framework using a circular buffer (I LOVE https://github.com/michaeltyson/TPCircularBuffer).
Hope this help
Related
Scenario
I am working on an application that does video processing and streaming. I already have video capture from the back camera streaming perfectly. The problem is I have to do my processing to the video data also, but only locally. As it turns out the API I am using to do the local video processing requires a different pixel format than the APIs I am using to stream the data to my server. It seems I need to have two separate sessions capturing video from the back camera simultaneously. That would allow one session to do the processing and one for streaming.
Problem
Every time I attempt to create a new session to use the same AVCaptureDevice (back), my streaming immediately stops. Code below:
captureSession = [[AVCaptureSession alloc] init];
AVCaptureDeviceInput *videoIn = [[AVCaptureDeviceInput alloc]
initWithDevice:[self videoDeviceWithPosition:AVCaptureDevicePositionBack]
error:nil];
if ([captureSession canAddInput:videoIn])
{
[captureSession addInput:videoIn];
}
AVCaptureVideoDataOutput *videoOut = [[AVCaptureVideoDataOutput alloc] init];
[videoOut setAlwaysDiscardsLateVideoFrames:YES];
[videoOut setVideoSettings:
#{(id)kCVPixelBufferPixelFormatTypeKey: #(kCVPixelFormatType_32BGRA)}];
dispatch_queue_t videoCaptureQueue =
dispatch_queue_create("Video Process Queue", DISPATCH_QUEUE_SERIAL);
[videoOut setSampleBufferDelegate:self queue:videoCaptureQueue];
if ([captureSession canAddOutput:videoOut]) {
[captureSession addOutput:videoOut];
}
I receive an interruption reason videoDeviceInUseByAnotherClient.
videoDeviceInUseByAnotherClient: An interruption caused by the video device temporarily being made unavailable (for example, when used by another capture session).
I have also tried adding the output of the original capture session to the new session but every time the canAddOutput: method returns NO. My guess is because there is already a session associated with that output.
Question
How do I use the same AVCaptureDevice to output to two separate AVCaptureVideoDataOutputs at the same time? Or how can I achieve the same thing as the diagram below?
Objective
Reading m4a file bought from iTunes Store via AVAssetReader.
Stream via HTTP and consumed by MobileVLCKit.
What I've tried
As far as I know, AVAssetReader only generates audio raw data, so I guess I should add ADTS header in front of every sample.
NSError *error = nil;
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:asset error:&error];
if (error != nil) {
NSLog(#"%#", [error localizedDescription]);
return -1;
}
AVAssetTrack* track = [asset.tracks objectAtIndex:0];
AVAssetReaderTrackOutput *readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:nil];
[reader addOutput:readerOutput];
[reader startReading];
while (reader.status == AVAssetReaderStatusReading){
AVAssetReaderTrackOutput * trackOutput = (AVAssetReaderTrackOutput *)[reader.outputs objectAtIndex:0];
CMSampleBufferRef sampleBufferRef;
#synchronized(self) {
sampleBufferRef = [trackOutput copyNextSampleBuffer];
}
CMItemCount = CMSampleBufferGetNumSamples(sampleBufferRef);
...
}
So, my question is, how do I loop every sample and add ADTS header?
First, you don't need trackOutput, it's the same as readerOutput that you already have.
UPDATE
My mistake, you're absolutely right. I thought the usual 0xFFF sync words were part of AAC, instead they're ADTS headers. So you must add an ADTS header to each of your AAC packets to stream them as ADTS or "aac". I think you have two choices:
Use AudioFileInitializeWithCallbacks + kAudioFileAAC_ADTSType to get the AudioFile API to add the headers for you. You write AAC packets to the AudioFileID and it will call your write callback from where you can stream AAC in ADTS.
Add the headers to the packets yourself. They're only 7 fiddly bytes (9 with checksums, but who uses them?). Some readable implementations here and here
Either way you need to call either CMSampleBufferGetAudioStreamPacketDescriptions or CMSampleBufferCallBlockForEachSample to get the individual AAC packets from a CMSampleBufferRef.
I'm trying to record audio to a file (working) and then sample the data in that file (giving strange results).
FYI, I am roughly following this code...
Extracting Amplitude Data from Linear PCM on the iPhone
I have noticed a few different results. For simplicity, assume the record time is fixed at 1 second.
when sampling up to 8,000 samples/sec, the mutable array (see code) will list 8,000 entries but only the first 4,000 have real-looking data, the last 4,000 points are the same number value (the exact number value varies from run-to-run).
somewhat related to issue #1. when sampling above 8,000 samples/second, the first half of the samples (ex. 5,000 of a 10,0000 sample set from 10,000 samples/sec for 1 second) will look like real data, while the values of the second half of the set will be fixed to some value (again this exact value varies run to run). See below snippet from my debug window, first number is packetIndex, second number is buffer value.
4996:-137
4997:1043
4998:-405
4999:-641
5000:195notice the switch from random data to constant value at 5k, for 10k sample file
5001:195
5002:195
5003:195
5004:195
3 . when having the mic listen to a speaker playing a 1kHz sinusoidal tone in close proximity and sampling this tone at 40,000 samples per second, the resulting data when plotted in a spreadsheet shows the signal at about 2kHz, or double.
Any ideas what I may be doing wrong here?
Here is my setup work to record the audio from the mic...
-(void) initAudioSession {
// setup av session
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryPlayAndRecord error:nil];
[audioSession setActive:YES error: nil];
NSLog(#"audio session initiated");
// settings for the recorded file
NSDictionary *recordSettings = [[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithFloat:SAMPLERATE],AVSampleRateKey,
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
[NSNumber numberWithInt:1],AVNumberOfChannelsKey,
[NSNumber numberWithInt:16],AVEncoderBitRateKey,
[NSNumber numberWithInt:AVAudioQualityMax],AVEncoderAudioQualityKey, nil];
// setup file name and location
NSString *docDir = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject];
fileURL = [NSURL fileURLWithPath:[docDir stringByAppendingPathComponent:#"input.caf"]];//caf or aif?
// initialize my new audio recorder
newAudioRecorder = [[AVAudioRecorder alloc] initWithURL:fileURL settings:recordSettings error:nil];
// show file location so i can check it with some player
NSLog(#"file path = %#",fileURL);
// check if the recorder exists, if so prepare the recorder, if not tell me in debug window
if (newAudioRecorder) {
[newAudioRecorder setDelegate:self];
[newAudioRecorder prepareToRecord];
[self.setupStatus setText:[NSString stringWithFormat:#"recorder ready"]];
}else{
NSLog(#"error setting up recorder");
}
}
Here is my code for loading the recorded file and grabbing the data...
//loads file and go thru values, converts data to be put into an NSMutableArray
-(void)readingRainbow{
// get audio file and put into a file ID
AudioFileID fileID;
AudioFileOpenURL((__bridge CFURLRef)fileURL, kAudioFileReadPermission, kAudioFileCAFType /*kAudioFileAIFFType*/ , &fileID);
// get number of packets of audio contained in file
// instead of getting packets, i just set them to the duration times the sample rate i set
// not sure if this is a valid approach
UInt64 totalPacketCount = SAMPLERATE*timer;
// get size of each packet, is this valid?
UInt32 maxPacketSizeInBytes = sizeof(SInt32);
// setup to extract audio data
UInt32 totPack32 = SAMPLERATE*timer;
UInt32 ioNumBytes = totPack32*maxPacketSizeInBytes;
SInt16 *outBuffer = malloc(ioNumBytes);
memset(outBuffer, 0, ioNumBytes);
// setup array to put buffer samples in
readArray = [[NSMutableArray alloc] initWithObjects: nil];
NSNumber *arrayData;
SInt16 data;
int data2;
// this may be where i need help as well....
// process every packet
for (SInt64 packetIndex = 0; packetIndex<totalPacketCount; packetIndex++) {
// method description for reference..
// AudioFileReadPackets(<#AudioFileID inAudioFile#>, <#Boolean inUseCache#>, <#UInt32 *outNumBytes#>,
// <#AudioStreamPacketDescription *outPacketDescriptions#>, <#SInt64 inStartingPacket#>,
// <#UInt32 *ioNumPackets#>, <#void *outBuffer#>)
// extract packet data, not sure if i'm setting this up properly
AudioFileReadPackets(fileID, false, &ioNumBytes, NULL, packetIndex, &totPack32, outBuffer);
// get buffer data and pass into mutable array
data = outBuffer[packetIndex];
data2=data;
arrayData = [[NSNumber alloc] initWithInt:data2];
[readArray addObject:arrayData];
// printf("%lld:%d\n",packetIndex,data);
printf("%d,",data);
}
Also, I'm using this method to start the recorder...
[newAudioRecorder recordForDuration:timer];
Thots? I'm a noob, so any info is greatly appreciated!
You may be recording 16-bit samples, but trying to read 32-bit samples from the file data, thus only finding half as many samples (the rest may be garbage).
I don't know which value to use to fetch raw YUV420p data. code below first:
AVCaptureVideoDataOutput *output = [[AVCaptureVideoDataOutput alloc] init];
output.alwaysDiscardsLateVideoFrames = YES;
output.videoSettings = #{(id)kCVPixelBufferPixelFormatTypeKey: [NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]};
//output.videoSettings = #{(id)kCVPixelBufferPixelFormatTypeKey: [NSNumber numberWithUnsignedInt:kCVPixelFormatType_32BGRA]};
dispatch_queue_t queue;
queue = dispatch_queue_create("CameraQueue", NULL);
[output setSampleBufferDelegate:self queue:queue];
[session addOutput:output];
I noticed that kCVPixelFormatType has some values, does somebody know which value is right to fetch raw YUV420p data?
kCVPixelFormatType_420YpCbCr8Planar
kCVPixelFormatType_420YpCbCr8PlanarFullRange
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
can be one of them?
It depends on which particular YUV420 you want to get:
Planar/Biplanar refers to the arrangement of the luma and chroma components in memory, Planar meaning that each component comes in a buffer, contiguous or not, and Biplanar pointing to two buffers, one for luma and another for chroma, usually interleaved. An example of Planar is YUV420 format and an example of Biplanar is NV21 or NV12
VideoRange and FullRange refers to the values of the luma component, Video referring to [16,235] accepted levels and FullRange to [0,255]. This confusing agreement comes from the MPEG standard (see here)...
I have an iOS app that is using the front camera of the phone and setting up an AVCaptureSession to read through the incoming camera data. I set up a simple frame counter to check the speed of data incoming, and to my surprise, when the camera is in low light the frame rate (measured using the imagecount variable in the code) is very slow, but as soon as I move the phone into a brightly lit area the frame rate will almost triple. I would like to keep the high frame rate of image processing throughout and have set the minFrameDuration variable to 30 fps, but that didnt help. Any ideas on why this random behaviour?
Code to create the capture session is below:
#pragma mark Create and configure a capture session and start it running
- (void)setupCaptureSession
{
NSError *error = nil;
// Create the session
session = [[AVCaptureSession alloc] init];
// Configure the session to produce lower resolution video frames, if your
// processing algorithm can cope. We'll specify medium quality for the
// chosen device.
session.sessionPreset = AVCaptureSessionPresetLow;
// Find a suitable AVCaptureDevice
//AVCaptureDevice *device=[AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
NSArray *devices = [AVCaptureDevice devices];
AVCaptureDevice *frontCamera;
AVCaptureDevice *backCamera;
for (AVCaptureDevice *device in devices) {
if ([device hasMediaType:AVMediaTypeVideo]) {
if ([device position] == AVCaptureDevicePositionFront) {
backCamera = device;
}
else {
frontCamera = device;
}
}
}
//Create a device input with the device and add it to the session.
AVCaptureDeviceInput *input = [AVCaptureDeviceInput deviceInputWithDevice:backCamera
error:&error];
if (!input) {
//Handling the error appropriately.
}
[session addInput:input];
// Create a VideoDataOutput and add it to the session
AVCaptureVideoDataOutput *output = [[AVCaptureVideoDataOutput alloc] init];
[session addOutput:output];
// Configure your output.
dispatch_queue_t queue = dispatch_queue_create("myQueue", NULL);
[output setSampleBufferDelegate:self queue:queue];
dispatch_release(queue);
// Specify the pixel format
output.videoSettings =
[NSDictionary dictionaryWithObject:
[NSNumber numberWithInt:kCVPixelFormatType_32BGRA]
forKey:(id)kCVPixelBufferPixelFormatTypeKey];
// If you wish to cap the frame rate to a known value, such as 30 fps, set
// minFrameDuration.
output.minFrameDuration = CMTimeMake(1,30);
//Start the session running to start the flow of data
[session startRunning];
}
#pragma mark Delegate routine that is called when a sample buffer was written
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
//counter to track frame rate
imagecount++;
//display to help see speed of images being processed on ios app
NSString *recognized = [[NSString alloc] initWithFormat:#"IMG COUNT - %d",imagecount];
[self performSelectorOnMainThread:#selector(debuggingText:) withObject:recognized waitUntilDone:YES];
}
When there is less light, the camera requires a longer exposure to get the same signal to noise ratio in each pixel. That is why you might expect the frame rate to drop in low light.
You are setting minFrameDuration to 1/30 s in an attempt to prevent long-exposure frames from slowing down the frame rate. However, you should be setting maxFrameDuration instead: your code as-is says the frame rate is no faster than 30 FPS, but it could be 10 FPS, or 1 FPS....
Also, the Documentation say to bracket any changes to these parameters with lockForConfiguration: and unlockForConfiguration: , so it may be that your changes just didn't take.