Using DJI video feed with Vision Framework - ios

I'm working on an app that uses the video feed from the DJI Mavic 2 and runs it through a machine learning model to identify objects.
I managed to get my app to preview the feed from the drone using this sample DJI project, but I'm having a lot of trouble trying to get the video data into a format that's usable by the Vision framework.
I used this example from Apple as a guide to create my model (which is working!) but it looks I need to create a VNImageRequestHandler object which is created with a cvPixelBuffer of type CMSampleBuffer in order to use Vision.
Any idea how to make this conversion? Is there a better way to do this?
class DJICameraViewController: UIViewController, DJIVideoFeedListener, DJISDKManagerDelegate, DJICameraDelegate, VideoFrameProcessor {
// ...
func videoFeed(_ videoFeed: DJIVideoFeed, didUpdateVideoData rawData: Data) {
let videoData = rawData as NSData
let videoBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: videoData.length)
videoData.getBytes(videoBuffer, length: videoData.length)
DJIVideoPreviewer.instance().push(videoBuffer, length: Int32(videoData.length))
}
// MARK: VideoFrameProcessor Protocol Implementation
func videoProcessorEnabled() -> Bool {
// This is never called
return true
}
func videoProcessFrame(_ frame: UnsafeMutablePointer<VideoFrameYUV>!) {
// This is never called
let pixelBuffer = frame.pointee.cv_pixelbuffer_fastupload as! CVPixelBuffer
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: exifOrientationFromDeviceOrientation(), options: [:])
do {
try imageRequestHandler.perform(self.requests)
} catch {
print(error)
}
}
} // End of DJICameraViewController class
EDIT: from what I've gathered from DJI's (spotty) documentation, it looks like the video feed is compressed H264. They claim the DJIWidget includes helper methods for decompression, but I haven't had success in understanding how to use them correctly because there is no documentation surrounding its use.
EDIT 2: Here's the issue I created on GitHub for the DJIWidget framework
EDIT 3: Updated code snippet with additional methods for VideoFrameProcessor, removing old code from videoFeed method
EDIT 4: Details about how to extract the pixel buffer successfully and utilize it can be found in this comment from GitHub

The steps :
Call DJIVideoPreviewer’s push:length: method and input the rawData. Inside DJIVideoPreviewer, if you have used VideoPreviewerSDKAdapter please skip this. (H.264 parsing and decoding steps will be performed once you do this.)
Conform to the VideoFrameProcessor protocol and call DJIVideoPreviewer.registFrameProcessor to register the VideoFrameProcessor protocol object.
VideoFrameProcessor protocol’s videoProcessFrame: method will output the VideoFrameYUV data.
Get the CVPixelBuffer data. VideoFrameYUV struct has a cv_pixelbuffer_fastupload field, this data is actually of type CVPixelBuffer when hardware decoding is turned on. If you are using software decoding, you will need to create a CVPixelBuffer yourself and copy the data from the VideoFrameYUV's luma, chromaB and chromaR field.
Code:
VideoFrameYUV* yuvFrame; // the VideoFrameProcessor output
CVPixelBufferRef pixelBuffer = NULL;
CVReturn resulst = CVPixelBufferCreate(kCFAllocatorDefault,
yuvFrame-> width,
yuvFrame -> height,
kCVPixelFormatType_420YpCbCr8Planar,
NULL,
&pixelBuffer);
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(pixelBuffer, 0) || pixelBuffer == NULL) {
return;
}
long yPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
long yPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer,0);
long uPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
long uPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
long vPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 2);
long vPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 2);
uint8_t* yDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
memcpy(yDestination, yuvFrame->luma, yPlaneWidth * yPlaneHeight);
uint8_t* uDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
memcpy(uDestination, yuvFrame->chromaB, uPlaneWidth * uPlaneHeight);
uint8_t* vDestination = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 2);
memcpy(vDestination, yuvFrame->chromaR, vPlaneWidth * vPlaneHeight);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);

Related

How to run TFlite Object Detection with a single image in Swift?

I got the tensorflow example app for iOS from here. My model works fine with this tf's app in real time detection, but I'd like to do it with a single image. As far as I could see, the main part to run the model is:
self.result = self.modelDataHandler?.runModel(onFrame: buffer)
This buffer variable is a CVPixelBuffer, I can obtain it from a video frame using CMSampleBufferGetImageBuffer() as the tf's app does. But my app is not using frames, so I don't have this option.
My captured photo is a UIImage, I tried to convert it to a CVPixelBuffer to use it with the code above:
let ciImage: CIImage = CIImage(cgImage: (self.image?.cgImage)!)
let buffer: CVPixelBuffer = self.getBuffer(from: ciImage)!
The getBuffer() is:
func getBuffer(from image: CIImage) -> CVPixelBuffer? {
let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
var pixelBuffer : CVPixelBuffer?
let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(image.extent.width), Int(image.extent.height), kCVPixelFormatType_32BGRA, attrs, &pixelBuffer)
guard (status == kCVReturnSuccess) else {
print("Error converting ciImage to CVPixelBuffer")
return nil
}
return pixelBuffer
}
And then run it with:
self.result = self.modelDataHandler?.runModel(onFrame: buffer)
let inferences: [Inference] = self.result!.inferences
let time: Double = self.result!.inferenceTime
As a result I have a time of about 50 or 60 ms, but the inferences comes empty. I don't know if my conversion from UIImage to CVPixelBuffer is right or if there is another error or procedure that I'm forgetting.
If you have some questions, please ask me, any help would be great! Thanks.
I've found my problem, my conversion from UIImage to CVPixelBuffer was wrong, no CIImage is needed. From this question I got the right code to do this conversion.

Save depth images from TrueDepth camera

I am trying to save depth images from the iPhoneX TrueDepth camera. Using the AVCamPhotoFilter sample code, I am able to view the depth, converted to grayscale format, on the screen of the phone in real-time. I cannot figure out how to save the sequence of depth images in the raw (16 bits or more) format.
I have depthData which is an instance of AVDepthData. One of its members is depthDataMap which is an instance of CVPixelBuffer and image format type kCVPixelFormatType_DisparityFloat16. Is there a way to save it to the phone to transfer for offline manipulation?
There's no standard video format for "raw" depth/disparity maps, which might have something to do with AVCapture not really offering a way to record it.
You have a couple of options worth investigating here:
Convert depth maps to grayscale textures (which you can do using the code in the AVCamPhotoFilter sample code), then pass those textures to AVAssetWriter to produce a grayscale video. Depending on the video format and grayscale conversion method you choose, other software you write for reading the video might be able to recover depth/disparity info with sufficient precision for your purposes from the grayscale frames.
Anytime you have a CVPixelBuffer, you can get at the data yourself and do whatever you want with it. Use CVPixelBufferLockBaseAddress (with the readOnly flag) to make sure the content won't change while you read it, then copy data from the pointer CVPixelBufferGetBaseAddress provides to wherever you want. (Use other pixel buffer functions to see how many bytes to copy, and unlock the buffer when you're done.)
Watch out, though: if you spend too much time copying from buffers, or otherwise retain them, they won't get deallocated as new buffers come in from the capture system, and your capture session will hang. (All told, it's unclear without testing whether a device has the memory & I/O bandwidth for much recording this way.)
You can use Compression library to create a zip file with the raw CVPixelBuffer data.
Few problems with this solution.
It's a lot of data and zip is not a good compression. (the compressed file is 20 times bigger than 32bits per frame video with the same number of frames).
Apple's Compression library creates a file which standard zip program does't open. I use zlib in C code to read it and use inflateInit2(&strm, -15); to make it work.
You'll need to do some work to export the file out of your application
Here is my code (which I limited to 250 frames since it hold it in RAM but you can flush to disk if needed more frames):
// DepthCapture.swift
// AVCamPhotoFilter
//
// Created by Eyal Fink on 07/04/2018.
// Copyright © 2018 Resonai. All rights reserved.
//
// Capture the depth pixelBuffer into a compress file.
// This is very hacky and there are lots of TODOs but instead we need to replace
// it with a much better compression (video compression)....
import AVFoundation
import Foundation
import Compression
class DepthCapture {
let kErrorDomain = "DepthCapture"
let maxNumberOfFrame = 250
lazy var bufferSize = 640 * 480 * 2 * maxNumberOfFrame // maxNumberOfFrame frames
var dstBuffer: UnsafeMutablePointer<UInt8>?
var frameCount: Int64 = 0
var outputURL: URL?
var compresserPtr: UnsafeMutablePointer<compression_stream>?
var file: FileHandle?
// All operations handling the compresser oobjects are done on the
// porcessingQ so they will happen sequentially
var processingQ = DispatchQueue(label: "compression",
qos: .userInteractive)
func reset() {
frameCount = 0
outputURL = nil
if self.compresserPtr != nil {
//free(compresserPtr!.pointee.dst_ptr)
compression_stream_destroy(self.compresserPtr!)
self.compresserPtr = nil
}
if self.file != nil {
self.file!.closeFile()
self.file = nil
}
}
func prepareForRecording() {
reset()
// Create the output zip file, remove old one if exists
let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0] as NSString
self.outputURL = URL(fileURLWithPath: documentsPath.appendingPathComponent("Depth"))
FileManager.default.createFile(atPath: self.outputURL!.path, contents: nil, attributes: nil)
self.file = FileHandle(forUpdatingAtPath: self.outputURL!.path)
if self.file == nil {
NSLog("Cannot create file at: \(self.outputURL!.path)")
return
}
// Init the compression object
compresserPtr = UnsafeMutablePointer<compression_stream>.allocate(capacity: 1)
compression_stream_init(compresserPtr!, COMPRESSION_STREAM_ENCODE, COMPRESSION_ZLIB)
dstBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
compresserPtr!.pointee.dst_ptr = dstBuffer!
//defer { free(bufferPtr) }
compresserPtr!.pointee.dst_size = bufferSize
}
func flush() {
//let data = Data(bytesNoCopy: compresserPtr!.pointee.dst_ptr, count: bufferSize, deallocator: .none)
let nBytes = bufferSize - compresserPtr!.pointee.dst_size
print("Writing \(nBytes)")
let data = Data(bytesNoCopy: dstBuffer!, count: nBytes, deallocator: .none)
self.file?.write(data)
}
func startRecording() throws {
processingQ.async {
self.prepareForRecording()
}
}
func addPixelBuffers(pixelBuffer: CVPixelBuffer) {
processingQ.async {
if self.frameCount >= self.maxNumberOfFrame {
// TODO now!! flush when needed!!!
print("MAXED OUT")
return
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let add : UnsafeMutableRawPointer = CVPixelBufferGetBaseAddress(pixelBuffer)!
self.compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(add.assumingMemoryBound(to: UInt8.self))
let height = CVPixelBufferGetHeight(pixelBuffer)
self.compresserPtr!.pointee.src_size = CVPixelBufferGetBytesPerRow(pixelBuffer) * height
let flags = Int32(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_OK {
NSLog("Buffer compression retured: \(compression_status)")
return
}
if self.compresserPtr!.pointee.src_size != 0 {
NSLog("Compression lib didn't eat all data: \(compression_status)")
return
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
// TODO(eyal): flush when needed!!!
self.frameCount += 1
print("handled \(self.frameCount) buffers")
}
}
func finishRecording(success: #escaping ((URL) -> Void)) throws {
processingQ.async {
let flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
self.compresserPtr!.pointee.src_size = 0
//compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_END {
NSLog("ERROR: Finish failed. compression retured: \(compression_status)")
return
}
self.flush()
DispatchQueue.main.sync {
success(self.outputURL!)
}
self.reset()
}
}
}

Red5 CustomVideoSource Send Black & White Video

I am using Red5 iOS code and their CustomVideoSource class. Successfully publish the stream over server but it's shows as Black & White. Not the actual coloured stream.
If any one had faced this issue , please help me to find solution for it.
Please find the code sample
let contextImage = McamImage.shared.image
let image: CGImage? = contextImage.cgImage
let dataProvider: CGDataProvider? = image?.dataProvider
let data: CFData? = dataProvider?.data
if (data != nil) {
let baseAddress = CFDataGetBytePtr(data!)
//contextImage = nil
/*
* We own the copied CFData which will back the CVPixelBuffer, thus the data's lifetime is bound to the buffer.
* We will use a CVPixelBufferReleaseBytesCallback callback in order to release the CFData when the buffer dies.
*/
let unmanagedData = Unmanaged<CFData>.passRetained(data!)
var pixelBuffer: CVPixelBuffer?
var result = CVPixelBufferCreateWithBytes(nil,
(image?.width)!,
(image?.height)!,
kCVPixelFormatType_24RGB,
UnsafeMutableRawPointer( mutating: baseAddress!),
(image?.bytesPerRow)!,
{ releaseContext, baseAddress in
let contextData = Unmanaged<CFData>.fromOpaque(releaseContext!)
contextData.release()
},
unmanagedData.toOpaque(),
nil,
&pixelBuffer)
Thanks!

Spotify iOS SDK FFT with EZAudio returning NaN

I am trying to perform fft on Spotify's audio stream using EZAudio.
Following this suggestion, I have subclassed SPTCoreAudioController, overrode attemptToDeliverAudioFrames:ofCount:streamDescription:, and initialized my SPTAudioStreamingController with my new class successfully.
Spotify does not say if the pointer, which is passed into the overridden function, points to a float, double, integer, etc.. I have interpreted it as many different data types, which all failed, leaving me confused if my fft is wrong or if my audio buffer is wrong. Here is spotify's documentation on SPTCoreAudioController.
Assuming the audio buffer is a buffer of floats, here is one of my attempts at FFT:
class GetAudioPCM : SPTCoreAudioController, EZAudioFFTDelegate{
let ViewControllerFFTWindowSize: vDSP_Length = 128
var fft: EZAudioFFTRolling?
//var fft: EZAudioFFT?
override func attempt(toDeliverAudioFrames audioFrames: UnsafeRawPointer!, ofCount frameCount: Int, streamDescription audioDescription: AudioStreamBasicDescription) -> Int {
if let fft = fft{
let newPointer = (UnsafeMutableRawPointer(mutating: audioFrames)!.assumingMemoryBound(to: Float.self))
let resultBuffer : UnsafeMutablePointer<Float> = (fft.computeFFT(withBuffer: newPointer, withBufferSize: 128))
print("results: \(resultBuffer.pointee)")
}else{
fft = EZAudioFFTRolling(windowSize: ViewControllerFFTWindowSize, sampleRate: Float(audioDescription.mSampleRate), delegate: self)
//fft = EZAudioFFT(maximumBufferSize: 128, sampleRate: Float(audioDescription.mSampleRate))
}
return super.attempt(toDeliverAudioFrames: audioFrames, ofCount: frameCount, streamDescription: audioDescription)
}
func fft(_ fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
print("\n \n DATA ---------------------")
print(bufferSize)
if (fft?.fftData) != nil {
print("First: \(fftData.pointee)")
for i : Int in 0..<Int(bufferSize) {
print(fftData[i], terminator: " :: ")
}
}
}
}
I make my custom class the EZAudioFFTDelegate, I initialize an EZAudioFFTRolling (have also tried plain EZAudioFFT) object, and tell it to perform an fft with the buffer. I chose a small size of 128 just for initial testing.
I have tried different data types for the buffer and different methods of fft. I decided using a well known library that has fft should give me the correct results. Yet, my output from this and similar FFT's produced 'nan' for almost every single float in the new buffer.
Is the way I access spotify's audio buffer wrong, or is my FFT process at fault?

Perform Audio Analysis with FFT

I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
#objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
#objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio

Resources