I'm trying to reverse an AVAsset audio and save it to a file. To make things clear, I've made simple application with the issue https://github.com/ksenia-lyagusha/AudioReverse.git
The application takes mp4 video file from bundle, exports it to Temporary folder in the sandbox as single m4a file, then tries to read it from there, reverse and save result file back.
Temporary m4a file is OK.
The only result of my reverse part is Audio file in the Sandbox with white noise.
There is the part of code below, that is in charge of reversing AVAsset. It is based on related questions
How to reverse an audio file?
iOS audio manipulation - play local .caf file backwards
However, it doesn't work for me.
OSStatus theErr = noErr;
UInt64 fileDataSize = 0;
AudioFileID inputAudioFile;
AudioStreamBasicDescription theFileFormat;
UInt32 thePropertySize = sizeof(theFileFormat);
theErr = AudioFileOpenURL((__bridge CFURLRef)[NSURL URLWithString:inputPath], kAudioFileReadPermission, 0, &inputAudioFile);
thePropertySize = sizeof(fileDataSize);
theErr = AudioFileGetProperty(inputAudioFile, kAudioFilePropertyAudioDataByteCount, &thePropertySize, &fileDataSize);
UInt32 ps = sizeof(AudioStreamBasicDescription) ;
AudioFileGetProperty(inputAudioFile, kAudioFilePropertyDataFormat, &ps, &theFileFormat);
UInt64 dataSize = fileDataSize;
void *theData = malloc(dataSize);
// set up output file
AudioFileID outputAudioFile;
AudioStreamBasicDescription myPCMFormat;
myPCMFormat.mSampleRate = 44100;
myPCMFormat.mFormatID = kAudioFormatLinearPCM;
// kAudioFormatFlagsCanonical is deprecated
myPCMFormat.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsNonInterleaved;
myPCMFormat.mChannelsPerFrame = 1;
myPCMFormat.mFramesPerPacket = 1;
myPCMFormat.mBitsPerChannel = 32;
myPCMFormat.mBytesPerPacket = (myPCMFormat.mBitsPerChannel / 8) * myPCMFormat.mChannelsPerFrame;
myPCMFormat.mBytesPerFrame = myPCMFormat.mBytesPerPacket;
NSString *exportPath = [NSTemporaryDirectory() stringByAppendingPathComponent:#"ReverseAudio.caf"];
NSURL *outputURL = [NSURL fileURLWithPath:exportPath];
theErr = AudioFileCreateWithURL((__bridge CFURLRef)outputURL,
kAudioFileCAFType,
&myPCMFormat,
kAudioFileFlags_EraseFile,
&outputAudioFile);
//Read data into buffer
//if readPoint = dataSize, then bytesToRead = 0 in while loop and
//it is endless
SInt64 readPoint = dataSize-1;
UInt64 writePoint = 0;
while(readPoint > 0)
{
UInt32 bytesToRead = 2;
AudioFileReadBytes(inputAudioFile, false, readPoint, &bytesToRead, theData);
// bytesToRead is now the amount of data actually read
UInt32 bytesToWrite = bytesToRead;
AudioFileWriteBytes(outputAudioFile, false, writePoint, &bytesToWrite, theData);
// bytesToWrite is now the amount of data actually written
writePoint += bytesToWrite;
readPoint -= bytesToRead;
}
free(theData);
AudioFileClose(inputAudioFile);
AudioFileClose(outputAudioFile);
If I change file type in AudioFileCreateWithURL from kAudioFileCAFType to another the result file is not created in the Sandbox at all.
Thanks for any help.
You get white noise because your in and out file formats are incompatible. You have different sample rates and channels and probably other differences. To make this work you need to have a common (PCM) format mediating between reads and writes. This is a reasonable job for the new(ish) AVAudio frameworks. We read from file to PCM, shuffle the buffers, then write from PCM to file. This approach is not optimised for large files, as all data is read into the buffers in one go, but is enough to get you started.
You can call this method from your getAudioFromVideo completion block. Error handling ignored for clarity.
- (void)readAudioFromURL:(NSURL*)inURL reverseToURL:(NSURL*)outURL {
//prepare the in and outfiles
AVAudioFile* inFile =
[[AVAudioFile alloc] initForReading:inURL error:nil];
AVAudioFormat* format = inFile.processingFormat;
AVAudioFrameCount frameCount =(UInt32)inFile.length;
NSDictionary* outSettings = #{
AVNumberOfChannelsKey:#(format.channelCount)
,AVSampleRateKey:#(format.sampleRate)};
AVAudioFile* outFile =
[[AVAudioFile alloc] initForWriting:outURL
settings:outSettings
error:nil];
//prepare the forward and reverse buffers
self.forwaredBuffer =
[[AVAudioPCMBuffer alloc] initWithPCMFormat:format
frameCapacity:frameCount];
self.reverseBuffer =
[[AVAudioPCMBuffer alloc] initWithPCMFormat:format
frameCapacity:frameCount];
//read file into forwardBuffer
[inFile readIntoBuffer:self.forwaredBuffer error:&error];
//set frameLength of reverseBuffer to forwardBuffer framelength
AVAudioFrameCount frameLength = self.forwaredBuffer.frameLength;
self.reverseBuffer.frameLength = frameLength;
//iterate over channels
//stride is 1 or 2 depending on interleave format
NSInteger stride = self.forwaredBuffer.stride;
for (AVAudioChannelCount channelIdx = 0;
channelIdx < self.forwaredBuffer.format.channelCount;
channelIdx++) {
float* forwaredChannelData =
self.forwaredBuffer.floatChannelData[channelIdx];
float* reverseChannelData =
self.reverseBuffer.floatChannelData[channelIdx];
int32_t reverseIdx = 0;
//iterate over samples, allocate to reverseBuffer in reverse order
for (AVAudioFrameCount frameIdx = frameLength;
frameIdx >0;
frameIdx--) {
float sample = forwaredChannelData[frameIdx*stride];
reverseChannelData[reverseIdx*stride] = sample;
reverseIdx++;
}
}
//write reverseBuffer to outFile
[outFile writeFromBuffer:self.reverseBuffer error:nil];
}
I wasn't able to find the problem in your code, however I suggest you reversing AVAsset using AVAssetWriter. Following code is based on iOS reverse audio through AVAssetWritet. I've added additional method there to make it work. Finally I've got reversed file.
static NSMutableArray *samples;
static OSStatus sampler(CMSampleBufferRef sampleBuffer, CMItemCount index, void *refcon)
{
[samples addObject:(__bridge id _Nonnull)(sampleBuffer)];
return noErr;
}
- (void)reversePlayAudio:(NSURL *)inputURL
{
AVAsset *asset = [AVAsset assetWithURL:inputURL];
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:asset error:nil];
AVAssetTrack* audioTrack = [[asset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
NSMutableDictionary* audioReadSettings = [NSMutableDictionary dictionary];
[audioReadSettings setValue:[NSNumber numberWithInt:kAudioFormatLinearPCM]
forKey:AVFormatIDKey];
AVAssetReaderTrackOutput* readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:audioTrack outputSettings:audioReadSettings];
[reader addOutput:readerOutput];
[reader startReading];
NSDictionary *outputSettings = #{AVFormatIDKey : #(kAudioFormatMPEG4AAC),
AVSampleRateKey : #(44100.0),
AVNumberOfChannelsKey : #(1),
AVEncoderBitRateKey : #(128000),
AVChannelLayoutKey : [NSData data]};
AVAssetWriterInput *writerInput = [[AVAssetWriterInput alloc] initWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings];
NSString *exportPath = [NSTemporaryDirectory() stringByAppendingPathComponent:#"reverseAudio.m4a"];
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];
NSError *writerError = nil;
AVAssetWriter *writer = [[AVAssetWriter alloc] initWithURL:exportURL
fileType:AVFileTypeAppleM4A
error:&writerError];
[writerInput setExpectsMediaDataInRealTime:NO];
writer.shouldOptimizeForNetworkUse = NO;
[writer addInput:writerInput];
[writer startWriting];
[writer startSessionAtSourceTime:kCMTimeZero];
CMSampleBufferRef sample;// = [readerOutput copyNextSampleBuffer];
samples = [[NSMutableArray alloc] init];
while (sample != NULL) {
sample = [readerOutput copyNextSampleBuffer];
if (sample == NULL)
continue;
CMSampleBufferCallForEachSample(sample, &sampler, NULL);
CFRelease(sample);
}
NSArray* reversedSamples = [[samples reverseObjectEnumerator] allObjects];
for (id reversedSample in reversedSamples) {
if (writerInput.readyForMoreMediaData) {
[writerInput appendSampleBuffer:(__bridge CMSampleBufferRef)(reversedSample)];
}
else {
[NSThread sleepForTimeInterval:0.05];
}
}
[samples removeAllObjects];
[writerInput markAsFinished];
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0);
dispatch_async(queue, ^{
[writer finishWritingWithCompletionHandler:^{
// writing is finished
// reversed audio file in TemporaryDirectory in the Sandbox
}];
});
}
Known issues of the code.
There might be some problems with the memory, if the audio is long.
The audio file's duration is longer than original's. (As a quick fix you might cut it down as usual AVAsset).
Related
I have a React Native (Expo) app which captures audio using the expo-av library.
It then uploads the audio file to Amazon S3, and then Transcribes that in Amazon Transcribe.
For Android , i save the audio as a '.m4a' file, and call the Amazon Transcribe API as :
transcribe_client.start_transcription_job(TranscriptionJobName = job_name,
Media={'MediaFileUri' : file_uri},
MediaFormat='mp4',
LanguageCode='en-US')
What should the 'MediaFormat' be for upload from an iOS device, which will typically be a '.caf' file ?
Amazon Transcribe only allows these media formats
MP3, MP4, WAV, FLAC, AMR, OGG, and WebM
Possible solutions:
Create an API wich does the conversion for you.
You can easly create one using for example the FFMPEG python library.
Use an already made API.
By using the cloudconvert API you can convert the file with ease, but only if you pay for it.
Use an different library to record the IOS audio.
There's an module called react-native-record-audio-ios wich is made entirely for IOS and record audio in .caf, .m4a, and .wav.
Use the LAME api to convert it.
As said here, you can convert a .caf file into a .mp3 one by probably creating a native module wich would run this:
FILE *pcm = fopen("file.caf", "rb");
FILE *mp3 = fopen("file.mp3", "wb");
const int PCM_SIZE = 8192;
const int MP3_SIZE = 8192;
short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];
lame_t lame = lame_init();
lame_set_in_samplerate(lame, 44100);
lame_set_VBR(lame, vbr_default);
lame_init_params(lame);
do {
read = fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
if (read == 0)
write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
else
write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
fwrite(mp3_buffer, write, 1, mp3);
} while (read != 0);
lame_close(lame);
fclose(mp3);
fclose(pcm);
Creating an native module who runs this objective-c code:
-(void) convertToWav
{
// set up an AVAssetReader to read from the iPod Library
NSString *cafFilePath=[[NSBundle mainBundle]pathForResource:#"test" ofType:#"caf"];
NSURL *assetURL = [NSURL fileURLWithPath:cafFilePath];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError *assetError = nil;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:songAsset
error:&assetError]
;
if (assetError) {
NSLog (#"error: %#", assetError);
return;
}
AVAssetReaderOutput *assetReaderOutput = [AVAssetReaderAudioMixOutput
assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
audioSettings: nil];
if (! [assetReader canAddOutput: assetReaderOutput]) {
NSLog (#"can't add reader output... die!");
return;
}
[assetReader addOutput: assetReaderOutput];
NSString *title = #"MyRec";
NSArray *docDirs = NSSearchPathForDirectoriesInDomains (NSDocumentDirectory, NSUserDomainMask, YES);
NSString *docDir = [docDirs objectAtIndex: 0];
NSString *wavFilePath = [[docDir stringByAppendingPathComponent :title]
stringByAppendingPathExtension:#"wav"];
if ([[NSFileManager defaultManager] fileExistsAtPath:wavFilePath])
{
[[NSFileManager defaultManager] removeItemAtPath:wavFilePath error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:wavFilePath];
AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:exportURL
fileType:AVFileTypeWAVE
error:&assetError];
if (assetError)
{
NSLog (#"error: %#", assetError);
return;
}
AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)], AVChannelLayoutKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
AVAssetWriterInput *assetWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings];
if ([assetWriter canAddInput:assetWriterInput])
{
[assetWriter addInput:assetWriterInput];
}
else
{
NSLog (#"can't add asset writer input... die!");
return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;
[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];
__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue = dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
usingBlock: ^
{
while (assetWriterInput.readyForMoreMediaData)
{
CMSampleBufferRef nextBuffer = [assetReaderOutput copyNextSampleBuffer];
if (nextBuffer)
{
// append buffer
[assetWriterInput appendSampleBuffer: nextBuffer];
convertedByteCount += CMSampleBufferGetTotalSampleSize (nextBuffer);
CMTime progressTime = CMSampleBufferGetPresentationTimeStamp(nextBuffer);
CMTime sampleDuration = CMSampleBufferGetDuration(nextBuffer);
if (CMTIME_IS_NUMERIC(sampleDuration))
progressTime= CMTimeAdd(progressTime, sampleDuration);
float dProgress= CMTimeGetSeconds(progressTime) / CMTimeGetSeconds(songAsset.duration);
NSLog(#"%f",dProgress);
}
else
{
[assetWriterInput markAsFinished];
// [assetWriter finishWriting];
[assetReader cancelReading];
}
}
}];
}
But, as said here:
Since the iPhone shouldn't really be used for processor intensive things such as audio conversion.
So i recommend you the third solution, because it's easier and doesn't look like an intensive task for the Iphone processor.
I have concatenated 2 audio files using AudioFileReadPacketData to read the existing files and writing the data to a third file using AudioFileWritePackets. the source files seem to contain information about their duration. I can access it using the code below passing it the file url
-(float)lengthForUrl:(NSURL *)url
{
AVURLAsset* audioAsset = [AVURLAsset URLAssetWithURL:url options:nil];
CMTime audioDuration = audioAsset.duration;
float audioDurationSeconds = CMTimeGetSeconds(audioDuration);
return audioDurationSeconds;
}
when I make the same call on my new ( combined file) i get a duration of 0.
how can a set the duration on my new file ?
or preserve it in the transfer process ?
I did not figure out how to set the duration on my combined file but did
figure out how to create it with the duration intact.
I switched from using AssetReader / Writer to using ExtFileReader / Writer
-- working code is below
- (BOOL)combineFilesFor:(NSURL*)url
{
dispatch_async(dispatch_get_main_queue(), ^{
activity.hidden = NO;
[activity startAnimating];
});
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectory = [paths objectAtIndex:0];
NSString *soundOneNew = [documentsDirectory stringByAppendingPathComponent:Kcombined];
NSLog(#"%#",masterUrl);
NSLog(#"%#",url);
// create the url for the combined file
CFURLRef out_url = (__bridge CFURLRef)[NSURL fileURLWithPath:soundOneNew];
NSLog(#"out file url : %#", out_url);
ExtAudioFileRef af_in;
ExtAudioFileRef af_out = NULL;
Float64 existing_audioDurationSeconds = [self lengthForUrl:masterUrl];
NSLog(#"duration of existing file : %f", existing_audioDurationSeconds);
Float64 new_audioDurationSeconds = [self lengthForUrl:url];
NSLog(#"duration of existing file : %f", new_audioDurationSeconds);
Float64 combined_audioDurationSeconds = existing_audioDurationSeconds + new_audioDurationSeconds;
NSLog(#"duration of combined : %f", combined_audioDurationSeconds);
// put the urls of the files to combine in to an array
NSArray *sourceURLs = [NSArray arrayWithObjects:(__bridge id)((__bridge CFURLRef)
AudioStreamBasicDescription inputFileFormat;
AudioStreamBasicDescription converterFormat;
UInt32 thePropertySize = sizeof(inputFileFormat);
AudioStreamBasicDescription outputFileFormat;
#define BUFFER_SIZE 4096
UInt8 *buffer = NULL;
for (id in_url in sourceURLs) {
NSLog(#"url in = %#",in_url);
OSStatus in_stat = ExtAudioFileOpenURL ( (__bridge CFURLRef)(in_url), &af_in);
NSLog(#"in file open status : %d", (int)in_stat);
bzero(&inputFileFormat, sizeof(inputFileFormat));
in_stat = ExtAudioFileGetProperty(af_in, kExtAudioFileProperty_FileDataFormat,
&thePropertySize, &inputFileFormat);
NSLog(#"in file get property status : %d", (int)in_stat);
memset(&converterFormat, 0, sizeof(converterFormat));
converterFormat.mFormatID = kAudioFormatLinearPCM;
converterFormat.mSampleRate = inputFileFormat.mSampleRate;
converterFormat.mChannelsPerFrame = 1;
converterFormat.mBytesPerPacket = 2;
converterFormat.mFramesPerPacket = 1;
converterFormat.mBytesPerFrame = 2;
converterFormat.mBitsPerChannel = 16;
converterFormat.mFormatFlags = kAudioFormatFlagsNativeEndian |kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger;
in_stat = ExtAudioFileSetProperty(af_in, kExtAudioFileProperty_ClientDataFormat,
sizeof(converterFormat), &converterFormat);
NSLog(#"in file set property status : %d", (int)in_stat);
if (af_out == NULL) {
memset(&outputFileFormat, 0, sizeof(outputFileFormat));
outputFileFormat.mFormatID = kAudioFormatMPEG4AAC;
outputFileFormat.mFormatFlags = kMPEG4Object_AAC_Main;
outputFileFormat.mSampleRate = inputFileFormat.mSampleRate;
outputFileFormat.mChannelsPerFrame = inputFileFormat.mChannelsPerFrame;
// create the new file to write the combined file 2
OSStatus out_stat = ExtAudioFileCreateWithURL(out_url, kAudioFileM4AType, &outputFileFormat,
NULL, kAudioFileFlags_EraseFile, &af_out);
NSLog(#"out file create status : %d", out_stat);
out_stat = ExtAudioFileSetProperty(af_out, kExtAudioFileProperty_ClientDataFormat,
sizeof(converterFormat), &converterFormat);
NSLog(#"out file set property status : %d", out_stat);
}
// Buffer to read from source file and write to dest file
buffer = malloc(BUFFER_SIZE);
assert(buffer);
AudioBufferList conversionBuffer;
conversionBuffer.mNumberBuffers = 1;
conversionBuffer.mBuffers[0].mNumberChannels = inputFileFormat.mChannelsPerFrame;
conversionBuffer.mBuffers[0].mData = buffer;
conversionBuffer.mBuffers[0].mDataByteSize = BUFFER_SIZE;
while (TRUE) {
conversionBuffer.mBuffers[0].mDataByteSize = BUFFER_SIZE;
UInt32 frameCount = INT_MAX;
if (inputFileFormat.mBytesPerFrame > 0) {
frameCount = (conversionBuffer.mBuffers[0].mDataByteSize / inputFileFormat.mBytesPerFrame);
}
// Read a chunk of input
OSStatus err = ExtAudioFileRead(af_in, &frameCount, &conversionBuffer);
if (err) {
NSLog(#"error read from input file : %d", (int)err);
}else{
NSLog(#"read %d frames from inout file", frameCount);
}
// If no frames were returned, conversion is finished
if (frameCount == 0)
break;
// Write pcm data to output file
err = ExtAudioFileWrite(af_out, frameCount, &conversionBuffer);
if (err) {
NSLog(#"error write out file : %d", (int)err);
}else{
NSLog(#"wrote %d frames to out file", frameCount);
}
}
ExtAudioFileDispose(af_in);
}
ExtAudioFileDispose (af_out);
NSLog(#"combined file duration =: %f", [self lengthForUrl:[NSURL fileURLWithPath:soundOneNew]]);
// move the new file into place
[self deleteFileWithName:KUpdate forType:#""];
[self deleteFileWithName:KMaster forType:#""];
player = nil;
[self replaceFileAtPath:KMaster withData:[NSData dataWithContentsOfFile:soundOneNew]];
[self deleteFileWithName:Kcombined forType:#""];
[self pauseRecording];
dispatch_async(dispatch_get_main_queue(), ^{
activity.hidden = YES;
[activity stopAnimating];
record.enabled = YES;
});
return YES;
}
I'm working on an audio recorder that renders the waveform of each recording. Whenever a recording is made, the NSURL of the audio file is converted to an AVAsset. With the AVAsset I'm able to extract the samples of the audio track. This works fine for audio recordings that are short ( <40seconds), but the process takes 15-20 seconds on a 2.5 min track and only gets worse the longer the track is. Anyone have any tips or recommendations on how to get around this problem?
AVAssetReader *reader = [[AVAssetReader alloc] initWithAsset:self.audioAsset error:&error];
AVAssetReaderTrackOutput *output = [[AVAssetReaderTrackOutput alloc] initWithTrack:songTrack outputSettings:outputSettingsDict];
[reader addOutput:output];
NSMutableData * data = [NSMutableData dataWithLength:32768];
NSMutableArray *allSamples = [NSMutableArray array];
while (reader.status == AVAssetReaderStatusReading) {
CMSampleBufferRef sampleBufferRef = [output copyNextSampleBuffer];
if (sampleBufferRef) {
CMBlockBufferRef blockBufferRef = CMSampleBufferGetDataBuffer(sampleBufferRef);
size_t bufferLength = CMBlockBufferGetDataLength(blockBufferRef);
if (data.length < bufferLength) {
[data setLength:bufferLength];
}
CMBlockBufferCopyDataBytes(blockBufferRef, 0, bufferLength, data.mutableBytes);
Float32 *samples = (Float32 *)data.mutableBytes;
int sampleCount = (int)(bufferLength / bytesPerInputSample);
for (int i = 0; i < sampleCount; i++) {
[allSamples addObject:#(samples[i*channelCount])];
}
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
}
}
I am trying to grab the audio track from a video as raw pcm data. The plan is pass this data into a float* table array in libpd. I have managed to get some sample data but the number of samples reported is way too low. For example for a 29sec clip I am getting a reported 3968 samples. What I am looking forward is the amplitudes at each sample. For a track of audio length of 29 secs I would expect to have an array of 1278900 in size (at 44.1khz).
Here is what I have put together based on other examples:
- (void)audioCapture {
NSLog(#"capturing");
NSError *error;
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:self.videoAsset error:&error];
NSArray *audioTracks = [self.videoAsset tracksWithMediaType:AVMediaTypeAudio];
AVAssetTrack *audioTrack = nil;
if ([audioTracks count] > 0)
audioTrack = [audioTracks objectAtIndex:0];
// Decompress to Linear PCM with the asset reader
NSDictionary *decompressionAudioSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithUnsignedInt:kAudioFormatLinearPCM], AVFormatIDKey,
nil];
AVAssetReaderOutput *output = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:audioTrack outputSettings:decompressionAudioSettings];
[reader addOutput:output];
[reader startReading];
CMSampleBufferRef sample = [output copyNextSampleBuffer];
//array for our sample amp values
SInt16* samples = NULL;
//sample count;
CMItemCount numSamplesInBuffer = 0;
//sample buffer
CMBlockBufferRef buffer;
while( sample != NULL )
{
sample = [output copyNextSampleBuffer];
if( sample == NULL )
continue;
buffer = CMSampleBufferGetDataBuffer( sample );
size_t lengthAtOffset;
size_t totalLength;
char* data;
if( CMBlockBufferGetDataPointer( buffer, 0, &lengthAtOffset, &totalLength, &data ) != noErr )
{
NSLog( #"error!" );
break;
}
numSamplesInBuffer = CMSampleBufferGetNumSamples(sample);
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sample,
NULL,
&audioBufferList,
sizeof(audioBufferList),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&buffer
);
for (int bufferCount=0; bufferCount < audioBufferList.mNumberBuffers; bufferCount++) {
samples = (SInt16 *)audioBufferList.mBuffers[bufferCount].mData;
NSLog(#"idx %f",(double)samples[bufferCount]);
}
}
NSLog(#"num samps in buf %ld",numSamplesInBuffer);
//[PdBase copyArray:(float *)samples
//toArrayNamed:#"DestTable" withOffset:0
//count:pdTableSize];
}
I've managed to get the raw data from a MPMediaItem using an AVAssetReader after combining the answers of a couple of SO questions like this one and this one and a nice blog post. I'm also able to play this raw data using FMOD, but then a problem arises.
It appears the resulting audio is of lower quality than the original track. Though AVAssetTrack formatDescription tells me there are 2 channels in the data, the result sounds mono. It also sounds a bit dampened (less crispy) like the bitrate is lowered.
Am I doing something wrong or is the quality of the MPMediaItem data lowered on purpose by the AVAssetReader (because of piracy)?
#define OUTPUTRATE 44100
Initializing the AVAssetReader and AVAssetReaderTrackOutput
// prepare AVAsset and AVAssetReaderOutput etc
MPMediaItem* mediaItem = ...;
NSURL* ipodAudioUrl = [mediaItem valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset * asset = [[AVURLAsset alloc] initWithURL:ipodAudioUrl options:nil];
NSError * error = nil;
assetReader = [[AVAssetReader alloc] initWithAsset:asset error:&error];
if(error)
NSLog(#"error creating reader: %#", [error debugDescription]);
AVAssetTrack* songTrack = [asset.tracks objectAtIndex:0];
NSArray* trackDescriptions = songTrack.formatDescriptions;
numChannels = 2;
for(unsigned int i = 0; i < [trackDescriptions count]; ++i)
{
CMAudioFormatDescriptionRef item = (CMAudioFormatDescriptionRef)[trackDescriptions objectAtIndex:i];
const AudioStreamBasicDescription* bobTheDesc = CMAudioFormatDescriptionGetStreamBasicDescription (item);
if(bobTheDesc && bobTheDesc->mChannelsPerFrame == 1) {
numChannels = 1;
}
}
NSDictionary* outputSettingsDict = [[[NSDictionary alloc] initWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM],AVFormatIDKey,
[NSNumber numberWithInt:OUTPUTRATE],AVSampleRateKey,
[NSNumber numberWithInt:16],AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsBigEndianKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO],AVLinearPCMIsNonInterleaved,
nil] autorelease];
AVAssetReaderTrackOutput * output = [[[AVAssetReaderTrackOutput alloc] initWithTrack:songTrack outputSettings:outputSettingsDict] autorelease];
[assetReader addOutput:output];
[assetReader startReading];
Initializing FMOD and the FMOD sound
// Init FMOD
FMOD_RESULT result = FMOD_OK;
unsigned int version = 0;
/*
Create a System object and initialize
*/
result = FMOD::System_Create(&system);
ERRCHECK(result);
result = system->getVersion(&version);
ERRCHECK(result);
if (version < FMOD_VERSION)
{
fprintf(stderr, "You are using an old version of FMOD %08x. This program requires %08x\n", version, FMOD_VERSION);
exit(-1);
}
result = system->setSoftwareFormat(OUTPUTRATE, FMOD_SOUND_FORMAT_PCM16, 1, 0, FMOD_DSP_RESAMPLER_LINEAR);
ERRCHECK(result);
result = system->init(32, FMOD_INIT_NORMAL | FMOD_INIT_ENABLE_PROFILE, NULL);
ERRCHECK(result);
// Init FMOD sound stream
CMTimeRange timeRange = [songTrack timeRange];
float durationInSeconds = timeRange.duration.value / timeRange.duration.timescale;
FMOD_CREATESOUNDEXINFO exinfo = {0};
memset(&exinfo, 0, sizeof(FMOD_CREATESOUNDEXINFO));
exinfo.cbsize = sizeof(FMOD_CREATESOUNDEXINFO); /* required. */
exinfo.decodebuffersize = OUTPUTRATE; /* Chunk size of stream update in samples. This will be the amount of data passed to the user callback. */
exinfo.length = OUTPUTRATE * numChannels * sizeof(signed short) * durationInSeconds; /* Length of PCM data in bytes of whole song (for Sound::getLength) */
exinfo.numchannels = numChannels; /* Number of channels in the sound. */
exinfo.defaultfrequency = OUTPUTRATE; /* Default playback rate of sound. */
exinfo.format = FMOD_SOUND_FORMAT_PCM16; /* Data format of sound. */
exinfo.pcmreadcallback = pcmreadcallback; /* User callback for reading. */
exinfo.pcmsetposcallback = pcmsetposcallback; /* User callback for seeking. */
result = system->createStream(NULL, FMOD_OPENUSER, &exinfo, &sound);
ERRCHECK(result);
result = system->playSound(FMOD_CHANNEL_FREE, sound, false, &channel);
ERRCHECK(result);
Reading from the AVAssetReaderTrackOutput into a ring buffer
AVAssetReaderTrackOutput * trackOutput = (AVAssetReaderTrackOutput *)[assetReader.outputs objectAtIndex:0];
CMSampleBufferRef sampleBufferRef = [trackOutput copyNextSampleBuffer];
if (sampleBufferRef)
{
AudioBufferList audioBufferList;
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBufferRef, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
if(blockBuffer == NULL)
{
stopLoading = YES;
continue;
}
if(&audioBufferList == NULL)
{
stopLoading = YES;
continue;
}
if(audioBufferList.mNumberBuffers != 1)
NSLog(#"numBuffers = %lu", audioBufferList.mNumberBuffers);
for( int y=0; y<audioBufferList.mNumberBuffers; y++ )
{
AudioBuffer audioBuffer = audioBufferList.mBuffers[y];
SInt8 *frame = (SInt8*)audioBuffer.mData;
for(int i=0; i<audioBufferList.mBuffers[y].mDataByteSize; i++)
{
ringBuffer->push_back(frame[i]);
}
}
CMSampleBufferInvalidate(sampleBufferRef);
CFRelease(sampleBufferRef);
}
I'm not familiar with FMOD, so I can't comment there. AVAssetReader doesn't do any "copy protection" stuff, so that's not a worry. (If you can get the AVAssetURL, the track is DRM free)
Since you are using non-interleaved buffers, there will only be one buffer, so I guess your last bit of code might be wrong
Here's an example of some code that's working well for me. Btw, your for loop is probably not going to be very performant. You may consider using memcpy or something...
If you are not restricted to your existing ring buffer, try TPCircularBuffer (https://github.com/michaeltyson/TPCircularBuffer) it is amazing.
CMSampleBufferRef nextBuffer = NULL;
if(_reader.status == AVAssetReaderStatusReading)
{
nextBuffer = [_readerOutput copyNextSampleBuffer];
}
if (nextBuffer)
{
AudioBufferList abl;
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
nextBuffer,
NULL,
&abl,
sizeof(abl),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&blockBuffer);
// the correct way to get the number of bytes in the buffer
size_t size = CMSampleBufferGetTotalSampleSize(nextBuffer);
memcpy(ringBufferTail, abl.mBuffers[0].mData, size);
CFRelease(nextBuffer);
CFRelease(blockBuffer);
}
Hope this helps
You're initialiazing FMOD to output mono audio. Try
result = system->setSoftwareFormat(OUTPUTRATE, FMOD_SOUND_FORMAT_PCM16, 2, 0, FMOD_DSP_RESAMPLER_LINEAR);