How to define customized detector for iOS AVFoundation? - ios

I want to detect some triangle patterns from camera input using iPhone. I found some example code that can detect QR/bar code using AVFoundation. The main part seems to be AVMetadataMachineReadableCodeObject class. Here is some sample code from AppCoda:
func captureOutput(captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [AnyObject]!, fromConnection connection: AVCaptureConnection!) {
// Check if the metadataObjects array is not nil and it contains at least one object.
if metadataObjects == nil || metadataObjects.count == 0 {
qrCodeFrameView?.frame = CGRectZero
messageLabel.text = "No barcode/QR code is detected"
return
}
// Get the metadata object.
let metadataObj = metadataObjects[0] as! AVMetadataMachineReadableCodeObject
// Here we use filter method to check if the type of metadataObj is supported
// Instead of hardcoding the AVMetadataObjectTypeQRCode, we check if the type
// can be found in the array of supported bar codes.
if supportedBarCodes.contains(metadataObj.type) {
// if metadataObj.type == AVMetadataObjectTypeQRCode {
// If the found metadata is equal to the QR code metadata then update the status label's text and set the bounds
let barCodeObject = videoPreviewLayer?.transformedMetadataObjectForMetadataObject(metadataObj)
qrCodeFrameView?.frame = barCodeObject!.bounds
if metadataObj.stringValue != nil {
messageLabel.text = metadataObj.stringValue
}
}
In the above code, once a QR code is detected, a boundary box will be drawn and the text field will be updated. Similarly, AVMetadataFaceObject class is used in face detection applications. I saw from the reference that both classes are subclasses of AVMetadataObject.
I'm wondering if it is possible to customize a triangles detector by writing a subclass of AVMetadataObject, say, we call the subclass AVMetadataTriangleObject. (I have a readily available detection algorithm and have code written in Matlab. Transcribing it into swift shouldn't be tough.) If this approach is not possible, can anyone suggest alternative way(s) for achieving the above goal?
Thank you so much!

AVMetadataObject provides no hooks for implementing such a thing, and apart from AVAssetResourceLoader, AVFoundation does not allow much extension.
I think you should transcribe your algorithm to swift and run it against the captured CMSampleBuffer images that you get when you capture using from an AVCaptureVideoDataOutput.

Related

Google Mobile Vision not detecting any faces from a CIImage

I am trying to use the Google Mobile Vision API to detect when a user smiles from the camera feed. The problem I have is that the Google Mobile Vision API is not detecting any faces while apple's vision api immediately recognizes and tracks any face that I test my app with. I am using func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) { } to detect when a user is smiling. Apple's Vision API seems to work fine but Google's API does not detect any faces. How would I fix my code to get Google's API to work as well? What am I doing wrong?
My Code...
var options = [GMVDetectorFaceTrackingEnabled: true, GMVDetectorFaceLandmarkType: GMVDetectorFaceLandmark.all.rawValue, GMVDetectorFaceMinSize: 0.15] as [String : Any]
var GfaceDetector = GMVDetector.init(ofType: GMVDetectorTypeFace, options: options)
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
let ciImage1 = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
let Gimage = UIImage(ciImage: ciImage1)
var Gfaces = GfaceDetector?.features(in: Gimage, options: nil) as? [GMVFaceFeature]
let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
CIDetectorSmile: true,
CIDetectorEyeBlink: true]
let allFeatures = faceDetector?.features(in: ciImage1, options: options)
let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)
var smilingProb = CGFloat()
guard let features = allFeatures else { return }
print("GFace \(Gfaces?.count)")
//THE PRINT ABOVE RETURNS 0
//MARK: ------ Google System Setup
for face: GMVFaceFeature in Gfaces! {
print("Google1")
if face.hasSmilingProbability {
print("Google \(face.smilingProbability)")
smilingProb = face.smilingProbability
}
}
for feature in features {
if let faceFeature = feature as? CIFaceFeature {
let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
let featureDetails = ["has smile: \(faceFeature.hasSmile), \(smilingProb)",
"has closed left eye: \(faceFeature.leftEyeClosed)",
"has closed right eye: \(faceFeature.rightEyeClosed)"]
update(with: faceRect, text: featureDetails.joined(separator: "\n"))
}
}
if features.count == 0 {
DispatchQueue.main.async {
self.detailsView.alpha = 0.0
}
}
}
UPDATE
I copied and pasted the Google Mobile Vision Detection Code into another app and it worked. The difference was that instead of constantly receiving frames, the app had only one image to analyze. Could this have something to do with how often I send a request or the format/quality of the CIImage?
ANOTHER UPDATE
I have identified an issue with how my app works. It seems that the image that the API is receiving is not upright or inline with the orientation of the phone. For example if I hold my phone up in front of my face (in normal portrait mode) the image is rotated 90 degrees anti clockwise. I have absolutely no idea why this is happening as the live camera preview is normal. The Google Docs say...
The face detector expects images and the faces in them to be in an upright orientation. If you need to rotate the image, pass in orientation information in the dictionary options with GMVDetectorImageOrientation key. The detector will rotate the images for you based on the orientation value.
New Question: (I believe the answer to either one of these questions will solve my problem)
A: How would I use the GMVImageDetectorImageOrientation key to set the orientation right?
B: How would I rotate the UIImage 90 degrees clockwise (NOT THE UIIMAGEVIEW)?
THIRD UPDATE
I have successfully rotated the image right side up but Google Mobile Vision is still not detecting any faces, the image is a bit distorted but I do not think the amount of distortion is affecting Google Mobile Vision's response. So...
How would I use the GMVImageDetectorImageOrientation key to set the orientation right?
ANY HELP/RESPONSE IS APPRECIATED.

QR Reader with VNDetectBarcodeRequest

I've setup an AVCaptureSession with a video data output and am attempting to use iOS 11's Vision framework to read QR codes. The camera is setup like basically any AVCaptureSession is. I will abbreviate and just show setting up the output.
let output = AVCaptureVideoDataOutput()
output.setSampleBufferDelegate(self, queue: captureQueue)
captureSession.addOutput(output)
// I did this to get the CVPixelBuffer to be oriented in portrait.
// I don't know if it's needed and I'm not sure it matters anyway.
output.connection(with: .video)!.videoOrientation = .portrait
So the camera is up and running as always. Here is the code I am using to perform a VNImageRequestHandler for QR codes.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: [:])
let qrRequest = VNDetectBarcodesRequest { request, error in
let barcodeObservations = request.results as? [VNBarcodeObservation]
guard let qrCode = barcodeObservations?.flatMap({ $0.barcodeDescriptor as? CIQRCodeDescriptor }).first else { return }
if let code = String(data: qrCode.errorCorrectedPayload, encoding: .isoLatin1) {
debugPrint(code)
}
}
qrRequest.symbologies = [.QR]
try! imageRequestHandler.perform([qrRequest])
}
I am using a QR code that encodes http://www.google.com as a test. The debugPrint line prints out:
AVGG\u{03}¢ò÷wwrævöövÆRæ6öÐì\u{11}ì
I have tested this same QR code with the AVCaptureMetadataOutput that has been around for a while and that method decodes the QR code correctly. So my question is, what have I missed to get the output that I am getting?
(Obviously I could just use the AVCaptureMetadataOutput as a solution, because I can see that it works. But that doesn't help me learn how to use the Vision framework.)
Most likely the problem is here:
if let code = String(data: qrCode.errorCorrectedPayload, encoding: .isoLatin1)
Try to use .utf8.
Also i would suggest to look at the raw output of the 'errorCorrectedPayload' without encoding. Maybe it already has correct encoding.
The definition of errorCorrectedPayload says:
-- QR Codes are formally specified in ISO/IEC 18004:2006(E). Section 6.4.10 "Bitstream to codeword conversion" specifies the set of 8-bit codewords in the symbol immediately prior to splitting the message into blocks and applying error correction. --
This seems to work fine with VNBarcodeObservation.payloadStringValue instead of transforming VNBarcodeObservation.barcodeDescriptor.

CoreImage CIFeature for detecting face emotions

I am trying to use CoreImage CIFeature for detecting face emotions as those are the native APIs. I have created a sample view controller project and updated related code. When I launch this iOS application, it open up the camera. When I look up the camera and show smile emotion, this below sample code detects fine.
I need to also find other emotions like, Surprise, Sad and Angry emotions. I understand that CoreImage CIFeature doesn't have direct APIs for these other emotions. But, Is it possible to try manipulating the available APIs (such as hasSmile, leftEyeClosed, rightEyeClosed etc.) to detect other emotions such as Surprise, Sad and Angry through iOS program?
Could anyone come across working with this APIs, scenario and solve this issue, please suggest and share your ideas.
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let opaqueBuffer = Unmanaged<CVImageBuffer>.passUnretained(imageBuffer!).toOpaque()
let pixelBuffer = Unmanaged<CVPixelBuffer>.fromOpaque(opaqueBuffer).takeUnretainedValue()
let sourceImage = CIImage(cvPixelBuffer: pixelBuffer, options: nil)
options = [CIDetectorSmile : true as AnyObject, CIDetectorEyeBlink: true as AnyObject, CIDetectorImageOrientation : 6 as AnyObject]
let features = self.faceDetector!.features(in: sourceImage, options: options)
for feature in features as! [CIFaceFeature] {
if (feature.hasSmile) {
DispatchQueue.main.async {
self.updateSmileEmotion()
}
}
else {
DispatchQueue.main.async {
self.resetEmotionLabel()
}
}
}
func updateSmileEmotion () {
self.emtionLabel.text = " "
self.emtionLabel.text = "HAPPY"
}
func resetEmotionLabel () {
self.emtionLabel.text = " "
}
There are a variety of libraries that can do sentiment analysis on images, and most of those rely on machine learning. Its highly unlikely that you are going to get the same kind of results by just looking at what CIFeature gives you because its pretty limited even in comparison to other facial recognition libraries. See Google Cloud Vison, IBM Watson Cloud iOS SDK, Microsoft Cognitive Services

AVCaptureSession - Capturing within a certain section of the frame

I'm using AVCaptureSession to create a QR code scanner with AVCaptureMetadataOutput.
Everything is working as expected, however I'm wanting to put a graphical overlay on the scanner. In doing so, I'd like the scanner to only scan once the QR code is in a given section of the frame. Currently, it detects the QR code anywhere thats in the view, and I'd like it to trigger only when in the middle of the screen.
Is this even possible? For AVCapturePreviewLayer, I'm setting rectForMetadataOutputRectOfInterest but it doesn't seem to be working. Maybe I'm doing this wrong?
Some insight would be great. Thanks in advance!
I have not done this, but I believe this can be achieved. You see, when you catch the AVCaptureMetadataOutput, you can use the AVCaptureVideoPreviewLayer to get the QR code frame within the view, that's useful when you want to draw a rectangle around the captured QR.
func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection) {
guard let readableCode = metadataObjects.first as? AVMetadataMachineReadableCodeObject, let code = readableCode.stringValue else { return }
if let barcodeObject = videoPreview?.transformedMetadataObject(for: readableCode) {
qrCodeFrameView?.frame = barcodeObject.bounds
}
stopReading()
didRead?(code)
}
You could use that barcodeObject to then ask if your rect contains the barcode using CGRect's contains(_:) method
if qrReader.frame.contains(barcodeObject.bounds) {
stopReading()
didRead?(code)
}
I think that could/should work.

AVFoundation barcode scanner detecting AVMetadataMachineReadableCodeObject vs AVMetadataFaceObject

Compiling on Swift 2.0, running IOS 9.3 working with Xcode 7.2.1 under 10.11.3
Trying to implement a code 39 bar code scanner, but unable to figure how to cast this without crashing my code in the process, unless I use this hack.
AVFoundation returns both AVMetadataFaceObject & AVMetadataMachineReadableCodeObject objects when scanning my bar code.
If I try and cast the wrong object into the wrong type it crashes, and for the life of I can not seem to find a way of figuring out which type of code it is looking at beyond this hack.
Tried guard statement; no crash. Tried do/catch; no crashes. Tried to test the type, but nothing seems to work.
if metadataObjects == nil || metadataObjects.count == 0 {
//print("No code is detected")
return
} else {
let A1 = String(metadataObjects[0])
if (A1.hasPrefix("<AVMetadataFaceObject:")) {
print("Face -> \(A1)")
}
if (A1.hasPrefix("<AVMetadataMachineReadableCodeObject:")) {
let metadataObj = metadataObjects[0] as! AVMetadataMachineReadableCodeObject
self.lblDataInfo.text = metadataObj.stringValue
self.lblDataType.text = metadataObj.type
print("Machine -> \(A1) ")
}
}
Ok, this works, but I don't think its good coding practice, I fear it is hack.

Resources