I'm using the example found here: https://developer.xamarin.com/samples/monotouch/MultichannelMixer/
I modified the example to add a third input to the mixer that is a simple sine wave. I want to have the two audio files play alongside the sine wave. I do this by specifying a separate AudioStreamBasicDescription for the sine wave and code inside the HandleRenderDelegate of the mixer to generate it's samples.
The problem I'm facing is that this only works if I use an audio format with float samples instead of signed integers for the sine wave. If I try using integers, the result is that the sound from the audio files is heavily distorted and there is no sine wave.
here is code I'm using for the float format (which works)
desc = new AudioStreamBasicDescription()
{
BitsPerChannel = 32,
Format = AudioFormatType.LinearPCM,
FormatFlags = AudioFormatFlags.IsFloat | AudioFormatFlags.IsPacked,
SampleRate = 44100,
ChannelsPerFrame = 1,
FramesPerPacket = 1,
BytesPerFrame = 2,
BytesPerPacket = 2
};
Generating the sine wave in the HandleRenderDelegate
if (busNumber == 2) { // generate sine wave
var outL = (float*)data[0].Data;
var outR = (float*)data[1].Data;
float s;
for (int i = 0; i < numberFrames; i++) {
s = (float)(Math.Sin((double)pitchSample / (44100 / 440) * 2 * Math.PI)) * .2f;
pitchSample++;
outL[i] = outR[i] = s;
}
return AudioUnitStatus.OK;
}
The code above is working as expected, but now I want to use a signed-integer format for the sine wave. So I modify the format:
desc = new AudioStreamBasicDescription()
{
BitsPerChannel = 32,
Format = AudioFormatType.LinearPCM,
FormatFlags = AudioFormatFlags.IsSignedInteger | AudioFormatFlags.IsPacked,
SampleRate = 44100,
ChannelsPerFrame = 1,
FramesPerPacket = 1,
BytesPerFrame = 2,
BytesPerPacket = 2
};
And the wave generator:
if (busNumber == 2) {
var outL = (int*)data[0].Data;
var outR = (int*)data[1].Data;
short s;
for (int i = 0; i < numberFrames; i++) {
s = (short)(Math.Sin((double)pitchSample / (44100 / 440) * 2 * Math.PI) * .2 * Int16.MaxValue);
pitchSample++;
outL[i] = outR[i] = s;
}
return AudioUnitStatus.OK;
}
With the modified code I get distorted sounding audio files and no sine wave.
Why does it work with float samples but not integers?
I'm not sure that these are the only problems, but you have inconsistent settings:
BitsPerChannel set to 32 instead of 16
you would want to cast to (short *) instead of (int *)
ChannelsPerFrame set to 1, but you set stereo in the generator (I'm not sure if it's a bug)
(I'm supposed here that you want to use 16-bit signed integer.)
Note: in the float case BytesPerFrame and BytesPerPacked should be 4
Related
I'm writing some code to render camera preview using SkiaSharp. This is cross-platform but I came across a problem while writing the implementation for android.
I needed to convert YUV_420_888 to RGB8888 because that's what SkiaSharp supports and with the help of this thread, somehow managed to show decent quality images to my SkiaSharp canvas. The problem is the speed. At best I can get about 8 fps but usually it's just 4 or 5 fps. It turned out the biggest factor is the conversion. I now have about 3 versions of my ToRGB converter. I've even ended up trying "unsafe" code and parallel loops. I'll just show you my best one yet.
private unsafe byte[] ToRgb(byte[] yValuesArr, byte[] uValuesArr,
byte[] vValuesArr, int uvPixelStride, int uvRowStride)
{
var width = PixelSize.Width;
var height = PixelSize.Height;
var rgb = new byte[width * height * 4];
var partitions = Partitioner.Create(0, height);
Parallel.ForEach(partitions, range =>
{
var (item1, item2) = range;
Parallel.For(item1, item2, y =>
{
for (var x = 0; x < width; x++)
{
var yIndex = x + width * y;
var currentPosition = yIndex * 4;
var uvIndex = uvPixelStride * (x / 2) + uvRowStride * (y / 2);
fixed (byte* rgbFixed = rgb)
fixed (byte* yValuesFixed = yValuesArr)
fixed (byte* uValuesFixed = uValuesArr)
fixed (byte* vValuesFixed = vValuesArr)
{
var rgbPtr = rgbFixed;
var yValues = yValuesFixed;
var uValues = uValuesFixed;
var vValues = vValuesFixed;
var yy = *(yValues + yIndex);
var uu = *(uValues + uvIndex);
var vv = *(vValues + uvIndex);
var rTmp = yy + vv * 1436 / 1024 - 179;
var gTmp = yy - uu * 46549 / 131072 + 44 - vv * 93604 / 131072 + 91;
var bTmp = yy + uu * 1814 / 1024 - 227;
rgbPtr = rgbPtr + currentPosition;
*rgbPtr = (byte) (rTmp < 0 ? 0 : rTmp > 255 ? 255 : rTmp);
rgbPtr++;
*rgbPtr = (byte) (gTmp < 0 ? 0 : gTmp > 255 ? 255 : gTmp);
rgbPtr++;
*rgbPtr = (byte) (bTmp < 0 ? 0 : bTmp > 255 ? 255 : bTmp);
rgbPtr++;
*rgbPtr = 255;
}
}
});
});
return rgb;
}
You can also find it on my repo. You can also find on that same repo the part where I rendered the output to SkiaSharp
For a preview size of 1440x1080, running on my phone, this code takes about 120ms to finish. Even if all the other parts are optimized, the most I can get from that is 8fps. And no, it's not my hardware because the built-in camera app runs smoothly. By the way 1440x1080 is the output of my ChooseOptimalSize algorithm that I got from the mono-droid examples of android's Camera2 API. I don't know if it's the best way or if it lacks logic on detecting the fps and sizing down the preview to make it faster.
Does SkiaSharp support GPU drawing? If you connect the camera to a SurfaceTexture, you can use the preview frames as GL textures and render them efficiently into an OpenGL scene.
Even if not, you may still get faster results by sending the frames to the GPU and reading them back to the CPU with something like glReadPixels, as that'll do a RGB conversion within the GPU.
I'm writing a stereo wave file with AudioFileWriteBytes (CoreAudio / iOS) and the only way I can get it to work is by calling it for each sample on each channel.
The following code works:
// Prepare the format AudioStreamBasicDescription;
AudioStreamBasicDescription asbd = {
.mSampleRate = session.samplerate,
.mFormatID = kAudioFormatLinearPCM,
.mFormatFlags = kAudioFormatFlagIsBigEndian| kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked,
.mChannelsPerFrame = 2,
.mBitsPerChannel = 16,
.mFramesPerPacket = 1, // Always 1 for uncompressed formats
.mBytesPerPacket = 4, // 16 bits for 2 channels = 4 bytes
.mBytesPerFrame = 4 // 16 bits for 2 channels = 4 bytes
};
// Set up the file
AudioFileID audioFile;
OSStatus audioError = noErr;
audioError = AudioFileCreateWithURL((__bridge CFURLRef)fileURL, kAudioFileAIFFType, &asbd, kAudioFileFlags_EraseFile, &audioFile);
if (audioError != noErr) {
NSLog(#"Error creating file");
return;
}
// Write samples
UInt64 currentFrame = 0;
while (currentFrame < totalLengthInFrames) {
UInt64 numberOfFramesToWrite = totalLengthInFrames - currentFrame;
if (numberOfFramesToWrite > 2048) {
numberOfFramesToWrite = 2048;
}
UInt32 sampleByteCount = sizeof(int16_t);
UInt32 bytesToWrite = (UInt32)numberOfFramesToWrite * sampleByteCount;
int16_t *sampleBufferLeft = (int16_t *)malloc(bytesToWrite);
int16_t *sampleBufferRight = (int16_t *)malloc(bytesToWrite);
// Some magic to fill the buffers
for (int j = 0; j < numberOfFramesToWrite; j++) {
int16_t left = CFSwapInt16HostToBig(sampleBufferLeft[j]);
int16_t right = CFSwapInt16HostToBig(sampleBufferRight[j]);
audioError = AudioFileWriteBytes(audioFile, false, (currentFrame + j) * 4, &sampleByteCount, &left);
assert(audioError == noErr);
audioError = AudioFileWriteBytes(audioFile, false, (currentFrame + j) * 4 + 2, &sampleByteCount, &right);
assert(audioError == noErr);
}
free(sampleBufferLeft);
free(sampleBufferRight);
currentFrame += numberOfFramesToWrite;
}
However, it is (obviously) very slow and inefficient.
I can't find anything on how to use it with a big buffer so that I can write more than a single sample while also writing 2 channels.
I tried making a buffer going LRLRLRLR (left / right), and then write that with just one AudioFileWriteBytes call. I expected that to work, but it produced a file filled with noise.
This is the code:
UInt64 currentFrame = 0;
UInt64 bytePos = 0;
while (currentFrame < totalLengthInFrames) {
UInt64 numberOfFramesToWrite = totalLengthInFrames - currentFrame;
if (numberOfFramesToWrite > 2048) {
numberOfFramesToWrite = 2048;
}
UInt32 sampleByteCount = sizeof(int16_t);
UInt32 bytesInBuffer = (UInt32)numberOfFramesToWrite * sampleByteCount;
UInt32 bytesInOutputBuffer = (UInt32)numberOfFramesToWrite * sampleByteCount * 2;
int16_t *sampleBufferLeft = (int16_t *)malloc(bytesInBuffer);
int16_t *sampleBufferRight = (int16_t *)malloc(bytesInBuffer);
int16_t *outputBuffer = (int16_t *)malloc(bytesInOutputBuffer);
// Some magic to fill the buffers
for (int j = 0; j < numberOfFramesToWrite; j++) {
int16_t left = CFSwapInt16HostToBig(sampleBufferLeft[j]);
int16_t right = CFSwapInt16HostToBig(sampleBufferRight[j]);
outputBuffer[(j * 2)] = left;
outputBuffer[(j * 2) + 1] = right;
}
audioError = AudioFileWriteBytes(audioFile, false, bytePos, &bytesInOutputBuffer, &outputBuffer);
assert(audioError == noErr);
free(sampleBufferLeft);
free(sampleBufferRight);
free(outputBuffer);
bytePos += bytesInOutputBuffer;
currentFrame += numberOfFramesToWrite;
}
I also tried to just write the buffers at once (2048*L, 2048*R, etc.) which I did not expect to work, and it didn't.
How do I speed this up AND get a working wave file?
I tried making a buffer going LRLRLRLR (left / right), and then write that with just one AudioFileWriteBytes call.
This is the correct approach if using (the rather difficult) Audio File Services.
If possible, instead of the very low level Audio File Services, use Extended Audio File Services. It is a wrapper around Audio File Services that has built in format converters. Or even better yet, use AVAudioFile it is a wrapper around Extended Audio File Services that covers most common use cases.
If you are set on using Audio File Services, you'll have to interleave the audio manually like you had tried. Maybe show the code where you attempted this.
I am developing an app to show sin wave.
I am using AudioQueueNewOutput to output mono sound is OK, but when I come to stereo output, I have no idea how to do it.
I know the mChannelsPerFrame = 2 can generate wave in both left and right channel.
I also want to know what is the sequence of sending bytes to left and right channel? Is the first byte to left channel and the second byte to right channel?
Code:
_audioFormat = new AudioStreamBasicDescription();
_audioFormat->mSampleRate = SAMPLE_RATE; // 44100
_audioFormat->mFormatID = kAudioFormatLinearPCM;
_audioFormat->mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
_audioFormat->mFramesPerPacket = 1;
_audioFormat->mChannelsPerFrame = NUM_CHANNELS; // 1
_audioFormat->mBitsPerChannel = BITS_PER_CHANNEL; // 16
_audioFormat->mBytesPerPacket = BYTES_PER_FRAME; // 2
_audioFormat->mBytesPerFrame = BYTES_PER_FRAME; // 2
and
_sineTableLength = _audioFormat.mSampleRate / SAMPLE_LIMIT_FACTOR; // 44100/100 = 441
_sineTable = new SInt16[_sineTableLength];
for(int i = 0; i < _sineTableLength; i++)
{
// Transfer values between -1.0 and 1.0 to integer values between -sample max and sample max
_sineTable[i] = (SInt16)(sin(i * 2 * M_PI / _sineTableLength) * 32767);
}
and
AudioQueueNewOutput (&_audioFormat,
playbackCallback,
(__bridge void *)(self),
nil,
nil,
0,
&_queueObject);
static void playbackCallback (void* inUserData,
AudioQueueRef inAudioQueue,
AudioQueueBufferRef bufferReference){
SInt16* sample = (SInt16*)bufferReference->mAudioData;
// bufferSize 1024
for(int i = 0; i < bufferSize; i += _audioFormat.mBytesPerFrame, sample++)
{
// set value for *sample
// 9ms sin wave and 4.5ms 0
...
}
...
AudioQueueEnqueueBuffer(...)
}
Several days later,I have found the answer.
First: AudioStreamBasicDescription can set just like this ;
Then: bufferSize change from 1024 to 2048 ;
And: SInt16 in SInt16* sample = (SInt16*)bufferReference->mAudioData; all change to SInt32. Because the channel double,the bits double;
Last: Each 16 bits contains data that left or right channel need in sample,just feed it whatever you want.
I've been using this web page as a guideline for formant tracking of speech...
http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1
It all seems to be going pretty well, except for the last step, which is the converting of the cepstrum into a smoothed representation for simple peak picking for the formant tracking. The spectrograph looks good, and the cepstrograph (can I say that? :P) also looks good (from what I can tell), but the final stage the results (smoothed formant representation) are not what I expected.
I uploaded a sample of each stage as visual images to...
http://imgur.com/a/62duS
This sample is for the speech of the sound 'i' as in 'beed'. According to this site...
http://home.cc.umanitoba.ca/~robh/howto.html#formants
the first formant should come in around 500hz, and the second and third around 2200hz and 2800 hz respectively. The spetrograph shows something very similar, but on the last stage I am gettings results similar to...
F1 - 891
F2 - 1550
F3 - 2329
Any insight would be greatly appreciated. I've been going round in circles on this for some time. My code looks as follows...
// set up fft parameters
UInt32 log2n = 9;
UInt32 n = 512;
UInt32 window = n;
UInt32 halfN = n/2;
UInt32 stride = 1;
FFTSetup setupReal = [appDelegate getFftSetup];
int stepSize = (hpBuffer.sampleCount-window) / quantizeCount;
// calculate volume from raw samples, because it seems more reliable that fft
UInt32 volumeWindow = 128;
volumeBuffer = malloc(sizeof(float)*quantizeCount);
int windowPos = 0;
for (int i=0; i < quantizeCount; i++) {
windowPos += stepSize;
float total = 0.0f;
float max = 0.0f;
for (int p=windowPos; p < windowPos+volumeWindow; p++) {
total += sampleBuffer.buffer[p];
if (sampleBuffer.buffer[p] > max)
max = sampleBuffer.buffer[p];
}
volumeBuffer[i] = max;
}
// normalize volumebuffer
[FloatAudioBuffer normalizePositiveBuffer:volumeBuffer ofSize:quantizeCount];
// allocate memory for complex array
COMPLEX_SPLIT complexArray;
complexArray.realp = (float*)malloc(4096*sizeof(float));
complexArray.imagp = (float*)malloc(4096*sizeof(float));
// allocate some space for temporary hamming buffer
float *hamBuffer = malloc(n*sizeof(float));
// create spectrum and feature buffer
spectrumBuffer = malloc(sizeof(float)*halfN*quantizeCount);
formantBuffer = malloc(sizeof(float)*4096*quantizeCount);
cepstrumBuffer = malloc(sizeof(float)*halfN*quantizeCount);
lowCepstrumBuffer = malloc(sizeof(float)*featureCount*quantizeCount);
featureBuffer = malloc(sizeof(float)*featureCount*quantizeCount);
// create data point for each quantize segment
float TWOPI = 2.0f * M_PI;
for (int s=0; s < quantizeCount; s++) {
// copy buffer data into a seperate array and apply hamming window
int offset = (int)(s * stepSize);
for (int i=0; i < n; i++)
hamBuffer[i] = hpBuffer.buffer[offset+i] * ((1.0f-0.46f) - 0.46f*cos(TWOPI*i/((float)n-1.0f)));
// configure float array into acceptable input array format (interleaved)
vDSP_ctoz((COMPLEX*)hamBuffer, 2, &complexArray, 1, halfN);
// run FFT
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n, FFT_FORWARD);
// Absolute square (equivalent to mag^2)
complexArray.imagp[0] = 0.0f;
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, halfN);
bzero(complexArray.imagp, (halfN) * sizeof(float));
// scale
float scale = 1.0f / (2.0f*(float)n);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, halfN);
// get log of absolute values for passing to inverse FFT for cepstrum
for (int i=0; i < halfN; i++)
complexArray.realp[i] = logf(sqrtf(complexArray.realp[i]));
// save this into spectrum buffer
memcpy(&spectrumBuffer[s*halfN], complexArray.realp, halfN*sizeof(float));
// convert spectrum to interleaved ready for inverse fft
vDSP_ctoz((COMPLEX*)&spectrumBuffer[s*halfN], 2, &complexArray, 1, halfN/2);
// create cepstrum
vDSP_fft_zrip(setupReal, &complexArray, stride, log2n-1, FFT_INVERSE);
//convert interleaved to real and straight into cepstrum buffer
vDSP_ztoc(&complexArray, 1, (COMPLEX*)&cepstrumBuffer[s*halfN], 2, halfN/2);
// copy first part of cepstrum into low cepstrum buffer
memcpy(&lowCepstrumBuffer[s*featureCount], &cepstrumBuffer[s*halfN], featureCount*sizeof(float));
// make 8000 point array based on the first 15 values
float *tempArray = malloc(8192*sizeof(float));
for (int i=0; i < 8192; i++) {
if (i < 15)
tempArray[i] = cepstrumBuffer[s*halfN+i];
else
tempArray[i] = 0.0f;
}
vDSP_ctoz((COMPLEX*)tempArray, 2, &complexArray, 1, 4096);
float newLog2n = log2f(8192.0f);
complexArray.imagp[0] = 0.0f;
vDSP_fft_zrip(setupReal, &complexArray, stride, newLog2n, FFT_FORWARD);
vDSP_zvmags(&complexArray, 1, complexArray.realp, 1, 4096);
bzero(complexArray.imagp, (4096) * sizeof(float));
// scale
scale = 1.0f / (2.0f*(float)8192);
vDSP_vsmul(complexArray.realp, 1, &scale, complexArray.realp, 1, 4096);
// get magnitude
for (int i=0; i < 4096; i++)
complexArray.realp[i] = sqrtf(complexArray.realp[i]);
// write to formant buffer
memcpy(&formantBuffer[s*4096], complexArray.realp, 4096*sizeof(float));
// complex array now contains formant spectrum
// it's large, so get features here!
// try simple peak picking algorithm for first 3 formants
int formantIndex = 0;
float *peaks = malloc(6*sizeof(float));
for (int i=0; i < 6; i++)
peaks[i] = 0.0f;
for (int i=1; i < 4096-1 && formantIndex < 6; i++) {
if (complexArray.realp[i-1] < complexArray.realp[i] &&
complexArray.realp[i+1] < complexArray.realp[i])
peaks[formantIndex++] = i;
}
I have a really short audio file, say a 10th of a second in (say) .PCM format
I want to use RemoteIO to loop through the file repeatedly to produce a continuous musical tone. So how do I read this into an array of floats?
EDIT: while I could probably dig out the file format, extract the file into an NSData and process it manually, I'm guessing there is a more sensible generic approach... ( that eg copes with different formats )
You can use ExtAudioFile to read data from any supported data format in numerous client formats. Here is an example to read a file as 16-bit integers:
CFURLRef url = /* ... */;
ExtAudioFileRef eaf;
OSStatus err = ExtAudioFileOpenURL((CFURLRef)url, &eaf);
if(noErr != err)
/* handle error */
AudioStreamBasicDescription format;
format.mSampleRate = 44100;
format.mFormatID = kAudioFormatLinearPCM;
format.mFormatFlags = kAudioFormatFormatFlagIsPacked;
format.mBitsPerChannel = 16;
format.mChannelsPerFrame = 2;
format.mBytesPerFrame = format.mChannelsPerFrame * 2;
format.mFramesPerPacket = 1;
format.mBytesPerPacket = format.mFramesPerPacket * format.mBytesPerFrame;
err = ExtAudioFileSetProperty(eaf, kExtAudioFileProperty_ClientDataFormat, sizeof(format), &format);
/* Read the file contents using ExtAudioFileRead */
If you wanted Float32 data, you would set up format like this:
format.mFormatID = kAudioFormatLinearPCM;
format.mFormatFlags = kAudioFormatFlagsNativeFloatPacked;
format.mBitsPerChannel = 32;
This is the code I have used to convert my audio data (audio file ) into floating point representation and saved into an array.
-(void) PrintFloatDataFromAudioFile {
NSString * name = #"Filename"; //YOUR FILE NAME
NSString * source = [[NSBundle mainBundle] pathForResource:name ofType:#"m4a"]; // SPECIFY YOUR FILE FORMAT
const char *cString = [source cStringUsingEncoding:NSASCIIStringEncoding];
CFStringRef str = CFStringCreateWithCString(
NULL,
cString,
kCFStringEncodingMacRoman
);
CFURLRef inputFileURL = CFURLCreateWithFileSystemPath(
kCFAllocatorDefault,
str,
kCFURLPOSIXPathStyle,
false
);
ExtAudioFileRef fileRef;
ExtAudioFileOpenURL(inputFileURL, &fileRef);
AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate = 44100; // GIVE YOUR SAMPLING RATE
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kLinearPCMFormatFlagIsFloat;
audioFormat.mBitsPerChannel = sizeof(Float32) * 8;
audioFormat.mChannelsPerFrame = 1; // Mono
audioFormat.mBytesPerFrame = audioFormat.mChannelsPerFrame * sizeof(Float32); // == sizeof(Float32)
audioFormat.mFramesPerPacket = 1;
audioFormat.mBytesPerPacket = audioFormat.mFramesPerPacket * audioFormat.mBytesPerFrame; // = sizeof(Float32)
// 3) Apply audio format to the Extended Audio File
ExtAudioFileSetProperty(
fileRef,
kExtAudioFileProperty_ClientDataFormat,
sizeof (AudioStreamBasicDescription), //= audioFormat
&audioFormat);
int numSamples = 1024; //How many samples to read in at a time
UInt32 sizePerPacket = audioFormat.mBytesPerPacket; // = sizeof(Float32) = 32bytes
UInt32 packetsPerBuffer = numSamples;
UInt32 outputBufferSize = packetsPerBuffer * sizePerPacket;
// So the lvalue of outputBuffer is the memory location where we have reserved space
UInt8 *outputBuffer = (UInt8 *)malloc(sizeof(UInt8 *) * outputBufferSize);
AudioBufferList convertedData ;//= malloc(sizeof(convertedData));
convertedData.mNumberBuffers = 1; // Set this to 1 for mono
convertedData.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame; //also = 1
convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
convertedData.mBuffers[0].mData = outputBuffer; //
UInt32 frameCount = numSamples;
float *samplesAsCArray;
int j =0;
double floatDataArray[882000] ; // SPECIFY YOUR DATA LIMIT MINE WAS 882000 , SHOULD BE EQUAL TO OR MORE THAN DATA LIMIT
while (frameCount > 0) {
ExtAudioFileRead(
fileRef,
&frameCount,
&convertedData
);
if (frameCount > 0) {
AudioBuffer audioBuffer = convertedData.mBuffers[0];
samplesAsCArray = (float *)audioBuffer.mData; // CAST YOUR mData INTO FLOAT
for (int i =0; i<1024 /*numSamples */; i++) { //YOU CAN PUT numSamples INTEAD OF 1024
floatDataArray[j] = (double)samplesAsCArray[i] ; //PUT YOUR DATA INTO FLOAT ARRAY
printf("\n%f",floatDataArray[j]); //PRINT YOUR ARRAY'S DATA IN FLOAT FORM RANGING -1 TO +1
j++;
}
}
}}
I'm not familiar with RemoteIO, but I am familiar with WAV's and thought I'd post some format information on them. If you need, you should be able to easily parse out information such as duration, bit rate, etc...
First, here is an excellent website detailing the WAVE PCM soundfile format. This site also does an excellent job illustrating what the different byte addresses inside the "fmt" sub-chunk refer to.
WAVE File format
A WAVE is composed of a "RIFF" chunk and subsequent sub-chunks
Every chunk is at least 8 bytes
First 4 bytes is the Chunk ID
Next 4 bytes is the Chunk Size (The Chunk Size gives the size of the remainder of the chunk excluding the 8 bytes used for the Chunk ID and Chunk Size)
Every WAVE has the following chunks / sub chunks
"RIFF" (first and only chunk. All the rest are technically sub-chunks.)
"fmt " (usually the first sub-chunk after "RIFF" but can be anywhere between "RIFF" and "data". This chunk has information about the WAV such as number of channels, sample rate, and byte rate)
"data" (must be the last sub-chunk and contains all the sound data)
Common WAVE Audio Formats:
PCM
IEEE_Float
PCM_EXTENSIBLE (with a sub format of PCM or IEEE_FLOAT)
WAVE Duration and Size
A WAVE File's duration can be calculated as follows:
seconds = DataChunkSize / ByteRate
Where
ByteRate = SampleRate * NumChannels * BitsPerSample/8
and DataChunkSize does not include the 8 bytes reserved for the ID and Size of the "data" sub-chunk.
Knowing this, the DataChunkSize can be calculated if you know the duration of the WAV and the ByteRate.
DataChunkSize = seconds * ByteRate
This can be useful for calculating the size of the wav data when converting from formats like mp3 or wma. Note that a typical wav header is 44 bytes followed by DataChunkSize (this is always the case if the wav was converted using the Normalizer tool - at least as of this writing).
Update for Swift 5
This is a simple function that helps get your audio file into an array of floats. This is for both mono and stereo audio, To get the second channel of stereo audio, just uncomment sample 2
import AVFoundation
//..
do {
guard let url = Bundle.main.url(forResource: "audio_example", withExtension: "wav") else { return }
let file = try AVAudioFile(forReading: url)
if let format = AVAudioFormat(commonFormat: .pcmFormatFloat32, sampleRate: file.fileFormat.sampleRate, channels: file.fileFormat.channelCount, interleaved: false), let buf = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(file.length)) {
try file.read(into: buf)
guard let floatChannelData = buf.floatChannelData else { return }
let frameLength = Int(buf.frameLength)
let samples = Array(UnsafeBufferPointer(start:floatChannelData[0], count:frameLength))
// let samples2 = Array(UnsafeBufferPointer(start:floatChannelData[1], count:frameLength))
print("samples")
print(samples.count)
print(samples.prefix(10))
// print(samples2.prefix(10))
}
} catch {
print("Audio Error: \(error)")
}