I'm currently trying to convert the audio samples from AVAudioPCMBuffer to NSData - I had taken a look at the accepted answer on this SO Post and this code from GitHub but it appears some of the AVFAudio API's have changed...below is the extension I have for AVAudioPCMBuffer:
private extension AVAudioPCMBuffer {
func toNSData() -> NSData {
let channels = UnsafeBufferPointer(start: int16ChannelData, count: 1)
let ch0Data = NSData(bytes: channels[0], length:Int(frameCapacity * format.streamDescription.inTotalBitsPerChannel))
return ch0Data
}
}
I'm seeing an error of Value of type 'UnsafePointer<AudioStreamBasicDescription>' has no member 'inTotalBitsPerChannel'. So far, I've not been able to find out any other way to find out the inTotalBitsPerChannel value...any help appreciated!
I don't see any method named inTotalBitsPerChannel in either of the code samples you linked to; instead, they both seem to use mBytesPerFrame. You will also need .pointee to dereference the pointer. Finally, in modern Swift, you should generally prefer to use Data over NSData. So, basically, I think your extension should work if you rewrite the last line to:
let ch0Data = Data(bytes: channels[0], count: Int(frameCapacity * format.streamDescription.pointee.mBytesPerFrame))
Related
I'm working on an app that uses the video feed from the DJI Mavic 2 and runs it through a machine learning model to identify objects.
I managed to get my app to preview the feed from the drone using this sample DJI project, but I'm having a lot of trouble trying to get the video data into a format that's usable by the Vision framework.
I used this example from Apple as a guide to create my model (which is working!) but it looks I need to create a VNImageRequestHandler object which is created with a cvPixelBuffer of type CMSampleBuffer in order to use Vision.
Any idea how to make this conversion? Is there a better way to do this?
class DJICameraViewController: UIViewController, DJIVideoFeedListener, DJISDKManagerDelegate, DJICameraDelegate, VideoFrameProcessor {
// ...
func videoFeed(_ videoFeed: DJIVideoFeed, didUpdateVideoData rawData: Data) {
let videoData = rawData as NSData
let videoBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: videoData.length)
videoData.getBytes(videoBuffer, length: videoData.length)
DJIVideoPreviewer.instance().push(videoBuffer, length: Int32(videoData.length))
}
// MARK: VideoFrameProcessor Protocol Implementation
func videoProcessorEnabled() -> Bool {
// This is never called
return true
}
func videoProcessFrame(_ frame: UnsafeMutablePointer<VideoFrameYUV>!) {
// This is never called
let pixelBuffer = frame.pointee.cv_pixelbuffer_fastupload as! CVPixelBuffer
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientationFromDeviceOrientation(), options: [:])
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
} // End of DJICameraViewController class
EDIT: from what I've gathered from DJI's (spotty) documentation, it looks like the video feed is compressed H264. They claim the DJIWidget includes helper methods for decompression, but I haven't had success in understanding how to use them correctly because there is no documentation surrounding its use.
EDIT 2: Here's the issue I created on GitHub for the DJIWidget framework
EDIT 3: Updated code snippet with additional methods for VideoFrameProcessor, removing old code from videoFeed method
EDIT 4: Details about how to extract the pixel buffer successfully and utilize it can be found in this comment from GitHub
The steps :
Call DJIVideoPreviewer’s push:length: method and input the rawData. Inside DJIVideoPreviewer, if you have used VideoPreviewerSDKAdapter please skip this. (H.264 parsing and decoding steps will be performed once you do this.)
Conform to the VideoFrameProcessor protocol and call DJIVideoPreviewer.registFrameProcessor to register the VideoFrameProcessor protocol object.
VideoFrameProcessor protocol’s videoProcessFrame: method will output the VideoFrameYUV data.
Get the CVPixelBuffer data. VideoFrameYUV struct has a cv_pixelbuffer_fastupload field, this data is actually of type CVPixelBuffer when hardware decoding is turned on. If you are using software decoding, you will need to create a CVPixelBuffer yourself and copy the data from the VideoFrameYUV's luma, chromaB and chromaR field.
Code:
VideoFrameYUV* yuvFrame; // the VideoFrameProcessor output
CVPixelBufferRef pixelBuffer = NULL;
CVReturn resulst = CVPixelBufferCreate(kCFAllocatorDefault,
yuvFrame-> width,
yuvFrame -> height,
kCVPixelFormatType_420YpCbCr8Planar,
NULL,
&pixelBuffer);
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(pixelBuffer, 0) || pixelBuffer == NULL) {
return;
}
long yPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
long yPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer,0);
long uPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
long uPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
long vPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 2);
long vPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 2);
uint8_t* yDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy(yDestination, yuvFrame->luma, yPlaneWidth * yPlaneHeight);
uint8_t* uDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
memcpy(uDestination, yuvFrame->chromaB, uPlaneWidth * uPlaneHeight);
uint8_t* vDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 2);
memcpy(vDestination, yuvFrame->chromaR, vPlaneWidth * vPlaneHeight);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
I am trying to perform fft on Spotify's audio stream using EZAudio.
Following this suggestion, I have subclassed SPTCoreAudioController, overrode attemptToDeliverAudioFrames:ofCount:streamDescription:, and initialized my SPTAudioStreamingController with my new class successfully.
Spotify does not say if the pointer, which is passed into the overridden function, points to a float, double, integer, etc.. I have interpreted it as many different data types, which all failed, leaving me confused if my fft is wrong or if my audio buffer is wrong. Here is spotify's documentation on SPTCoreAudioController.
Assuming the audio buffer is a buffer of floats, here is one of my attempts at FFT:
class GetAudioPCM : SPTCoreAudioController, EZAudioFFTDelegate{
let ViewControllerFFTWindowSize: vDSP_Length = 128
var fft: EZAudioFFTRolling?
//var fft: EZAudioFFT?
override func attempt(toDeliverAudioFrames audioFrames: UnsafeRawPointer!, ofCount frameCount: Int, streamDescription audioDescription: AudioStreamBasicDescription) -> Int {
if let fft = fft{
let newPointer = (UnsafeMutableRawPointer(mutating: audioFrames)!.assumingMemoryBound(to: Float.self))
let resultBuffer : UnsafeMutablePointer<Float> = (fft.computeFFT(withBuffer: newPointer, withBufferSize: 128))
print("results: \(resultBuffer.pointee)")
}else{
fft = EZAudioFFTRolling(windowSize: ViewControllerFFTWindowSize, sampleRate: Float(audioDescription.mSampleRate), delegate: self)
//fft = EZAudioFFT(maximumBufferSize: 128, sampleRate: Float(audioDescription.mSampleRate))
}
return super.attempt(toDeliverAudioFrames: audioFrames, ofCount: frameCount, streamDescription: audioDescription)
}
func fft(_ fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
print("\n \n DATA ---------------------")
print(bufferSize)
if (fft?.fftData) != nil {
print("First: \(fftData.pointee)")
for i : Int in 0..<Int(bufferSize) {
print(fftData[i], terminator: " :: ")
}
}
}
}
I make my custom class the EZAudioFFTDelegate, I initialize an EZAudioFFTRolling (have also tried plain EZAudioFFT) object, and tell it to perform an fft with the buffer. I chose a small size of 128 just for initial testing.
I have tried different data types for the buffer and different methods of fft. I decided using a well known library that has fft should give me the correct results. Yet, my output from this and similar FFT's produced 'nan' for almost every single float in the new buffer.
Is the way I access spotify's audio buffer wrong, or is my FFT process at fault?
This might be an amateur question, but although I have searched Stack Overflow extensibly, I haven't been able to get an answer for my specific problem.
I was successful in creating a GIF file from an array of images by following a Github example:
func createGIF(with images: [NSImage], name: NSURL, loopCount: Int = 0, frameDelay: Double) {
let destinationURL = name
let destinationGIF = CGImageDestinationCreateWithURL(destinationURL, kUTTypeGIF, images.count, nil)!
// This dictionary controls the delay between frames
// If you don't specify this, CGImage will apply a default delay
let properties = [
(kCGImagePropertyGIFDictionary as String): [(kCGImagePropertyGIFDelayTime as String): frameDelay]
]
for img in images {
// Convert an NSImage to CGImage, fitting within the specified rect
let cgImage = img.CGImageForProposedRect(nil, context: nil, hints: nil)!
// Add the frame to the GIF image
CGImageDestinationAddImage(destinationGIF, cgImage, properties)
}
// Write the GIF file to disk
CGImageDestinationFinalize(destinationGIF)
}
Now, I would like to turn the actual GIF into NSData so I can upload it to Firebase, and be able to retrieve it on another device.
To achieve my goal, I have two options: Either to find how to use the code above to extract the GIF created (which seems to directly be created when creating the file), or to use the images on the function's parameters to create a new GIF but keep it on NSData format.
Does anybody have any ideas on how to do this?
Since nobody went ahead for over six months I will just put the answer from #Sachin Vas' comment here:
You can get the data using NSData(contentsOf: URL)
I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
#objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
#objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio
I'm developing an IOS App that handles a BlueTooth SensorTag.
That SensorTag is based on the TI BLE SensorTag, but we had some of the Sensors removed.
In the sourcecode of the original IOS App from TI the XYZ-Values are calculated like follows
with KXTJ9_RANGE defined as 1.0 in my implementation and KXTJ9 is the Accelerometer built on the SensorTag
+(float) calcXValue:(NSData *)data {
char scratchVal[data.length];
[data getBytes:&scratchVal length:3];
return ((scratchVal[0] * 1.0) / (64 / KXTJ9_RANGE));
}
The data comes as hexadecimal like "fe850d" and by the method will be cut in 3 parts.
Now i'm trying to convert this method to swift, but i get the wrong numbers back
e.g. fe should return something around 0.02 what the objective C Code does
My Swift Code so far
class Sensor: NSObject {
let Range: Float = 1.0
var data: NSData
var bytes: [Byte] = [0x00, 0x00, 0x00]
init(data: NSData) {
self.data = data
data.getBytes(&bytes, length: data.length)
}
func calcXValue()->Float {
return ((Float(bytes[0]) * 1.0) / (64.0 / Range))
}
...
}
I believe problem must lie around my Float(bytes[0]) because that makes 254 out of fe whereas scratchVal[0] in ObjectiveC is around 64
But my main problem is, i was all new with IOS programming when i had to begin with this project, so i chose to use Swift to learn and code the app with it.
Right now i use the original Objective C Code from TI to use with our SensorTag, but i would prefer using Swift for every part in the App.
On all current iOS and OS X platforms, char is a signed quantity,
so that the input fe is treated as a negative number.
Byte on the other hand is an alias for UInt8 which is unsigned.
Use [Int8] array instead to get the same behaviour as in your Objective-C code.
it depends on the BLE device Endianness, in relation to your device Endianness.
wiki on Endianness
To keep it simple you need to check which does two method
NSData *data4 = [completeData subdataWithRange:NSMakeRange(0, 4)];
int value = CFSwapInt32BigToHost(*(int*)([data4 bytes]));
or
NSData *data4 = [completeData subdataWithRange:NSMakeRange(0, 4)];
int value = CFSwapInt32LittleToHost(*(int*)([data4 bytes]));
And check which one make more sense when you parse the data.