I am trying to use the Google Mobile Vision API to detect when a user smiles from the camera feed. The problem I have is that the Google Mobile Vision API is not detecting any faces while apple's vision api immediately recognizes and tracks any face that I test my app with. I am using func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) { } to detect when a user is smiling. Apple's Vision API seems to work fine but Google's API does not detect any faces. How would I fix my code to get Google's API to work as well? What am I doing wrong?
My Code...
var options = [GMVDetectorFaceTrackingEnabled: true, GMVDetectorFaceLandmarkType: GMVDetectorFaceLandmark.all.rawValue, GMVDetectorFaceMinSize: 0.15] as [String : Any]
var GfaceDetector = GMVDetector.init(ofType: GMVDetectorTypeFace, options: options)
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
let ciImage1 = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
let Gimage = UIImage(ciImage: ciImage1)
var Gfaces = GfaceDetector?.features(in: Gimage, options: nil) as? [GMVFaceFeature]
let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
CIDetectorSmile: true,
CIDetectorEyeBlink: true]
let allFeatures = faceDetector?.features(in: ciImage1, options: options)
let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)
var smilingProb = CGFloat()
guard let features = allFeatures else { return }
print("GFace \(Gfaces?.count)")
//THE PRINT ABOVE RETURNS 0
//MARK: ------ Google System Setup
for face: GMVFaceFeature in Gfaces! {
print("Google1")
if face.hasSmilingProbability {
print("Google \(face.smilingProbability)")
smilingProb = face.smilingProbability
}
}
for feature in features {
if let faceFeature = feature as? CIFaceFeature {
let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
let featureDetails = ["has smile: \(faceFeature.hasSmile), \(smilingProb)",
"has closed left eye: \(faceFeature.leftEyeClosed)",
"has closed right eye: \(faceFeature.rightEyeClosed)"]
update(with: faceRect, text: featureDetails.joined(separator: "\n"))
}
}
if features.count == 0 {
DispatchQueue.main.async {
self.detailsView.alpha = 0.0
}
}
}
UPDATE
I copied and pasted the Google Mobile Vision Detection Code into another app and it worked. The difference was that instead of constantly receiving frames, the app had only one image to analyze. Could this have something to do with how often I send a request or the format/quality of the CIImage?
ANOTHER UPDATE
I have identified an issue with how my app works. It seems that the image that the API is receiving is not upright or inline with the orientation of the phone. For example if I hold my phone up in front of my face (in normal portrait mode) the image is rotated 90 degrees anti clockwise. I have absolutely no idea why this is happening as the live camera preview is normal. The Google Docs say...
The face detector expects images and the faces in them to be in an upright orientation. If you need to rotate the image, pass in orientation information in the dictionary options with GMVDetectorImageOrientation key. The detector will rotate the images for you based on the orientation value.
New Question: (I believe the answer to either one of these questions will solve my problem)
A: How would I use the GMVImageDetectorImageOrientation key to set the orientation right?
B: How would I rotate the UIImage 90 degrees clockwise (NOT THE UIIMAGEVIEW)?
THIRD UPDATE
I have successfully rotated the image right side up but Google Mobile Vision is still not detecting any faces, the image is a bit distorted but I do not think the amount of distortion is affecting Google Mobile Vision's response. So...
How would I use the GMVImageDetectorImageOrientation key to set the orientation right?
ANY HELP/RESPONSE IS APPRECIATED.
Related
I am trying to improve the performance of drawing the skeleton with body tracking for VNDetectHumanBodyPoseRequest even when further than 5 metres away, and with a stable iPhone XS camera.
The tracking has low confidence for the lower right limbs of my body, noticeable lag and there is jitter. I am unable to replicate the showcased performance in this year's WWDC demo video.
Here is the relevant code, adapted from Apple's sample code:
class Predictor {
func extractPoses(_ sampleBuffer: CMSampleBuffer) throws -> [VNRecognizedPointsObservation] {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .down)
let request = VNDetectHumanBodyPoseRequest()
do {
// Perform the body pose-detection request.
try requestHandler.perform([request])
} catch {
print("Unable to perform the request: \(error).\n")
}
return (request.results as? [VNRecognizedPointsObservation]) ?? [VNRecognizedPointsObservation]()
}
}
I've captured the video data and am handling the sample buffers here:
class CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let observations = try? predictor.extractPoses(sampleBuffer)
observations?.forEach { processObservation($0) }
}
func processObservation(_ observation: VNRecognizedPointsObservation) {
// Retrieve all torso points.
guard let recognizedPoints =
try? observation.recognizedPoints(forGroupKey: .all) else {
return
}
let storedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
return (key.rawValue, point.location)
})
DispatchQueue.main.sync {
let mappedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
guard point.confidence > 0.1 else { return nil }
let norm = VNImagePointForNormalizedPoint(point.location,
Int(drawingView.bounds.width),
Int(drawingView.bounds.height))
return (key.rawValue, norm)
})
let time = 1000 * observation.timeRange.start.seconds
// Draw the points onscreen.
DispatchQueue.main.async {
self.drawingView.draw(points: mappedPoints)
}
}
}
}
The drawingView.draw function is for a custom UIView on top of the camera view, and draws the points using CALayer sublayers. The AVCaptureSession code is exactly the same as the sample code here.
I tried using the VNDetectHumanBodyPoseRequest(completionHandler:) variant but this made no difference to the performance for me. I could try smoothing with a moving average filter though.. but there is still a problem with outlier predictions which are very inaccurate.
What am I missing?
This was a bug on iOS 14 beta v1-v3 I think. After upgrading to v4 and later beta releases, tracking is much better. The API also became a bit clearer with fine-grained type names with the latest beta updates.
Note I didn’t get an official answer from Apple regarding this bug, but this problem will probably completely disappear in the official iOS 14 release.
I'm trying to take two images using the camera, and align them using the iOS Vision framework:
func align(firstImage: CIImage, secondImage: CIImage) {
let request = VNTranslationalImageRegistrationRequest(
targetedCIImage: firstImage) {
request, error in
if error != nil {
fatalError()
}
let observation = request.results!.first
as! VNImageTranslationAlignmentObservation
secondImage = secondImage.transformed(
by: observation.alignmentTransform)
let compositedImage = firstImage!.applyingFilter(
"CIAdditionCompositing",
parameters: ["inputBackgroundImage": secondImage])
// Save the compositedImage to the photo library.
}
try! visionHandler.perform([request], on: secondImage)
}
let visionHandler = VNSequenceRequestHandler()
But this produces grossly mis-aligned images:
You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.
I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?
I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.
My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.
This is on iPhone X.
How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?
You should set secondImage as targetImage, and perform handler with firstImage.
I use your composite way.
check out this example from MLBoy:
let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])
let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}
guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform
image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing", parameters: ["inputBackgroundImage": image2])
I've setup an AVCaptureSession with a video data output and am attempting to use iOS 11's Vision framework to read QR codes. The camera is setup like basically any AVCaptureSession is. I will abbreviate and just show setting up the output.
let output = AVCaptureVideoDataOutput()
output.setSampleBufferDelegate(self, queue: captureQueue)
captureSession.addOutput(output)
// I did this to get the CVPixelBuffer to be oriented in portrait.
// I don't know if it's needed and I'm not sure it matters anyway.
output.connection(with: .video)!.videoOrientation = .portrait
So the camera is up and running as always. Here is the code I am using to perform a VNImageRequestHandler for QR codes.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: [:])
let qrRequest = VNDetectBarcodesRequest { request, error in
let barcodeObservations = request.results as? [VNBarcodeObservation]
guard let qrCode = barcodeObservations?.flatMap({ $0.barcodeDescriptor as? CIQRCodeDescriptor }).first else { return }
if let code = String(data: qrCode.errorCorrectedPayload, encoding: .isoLatin1) {
debugPrint(code)
}
}
qrRequest.symbologies = [.QR]
try! imageRequestHandler.perform([qrRequest])
}
I am using a QR code that encodes http://www.google.com as a test. The debugPrint line prints out:
AVGG\u{03}¢ò÷wwrævöövÆRæ6öÐì\u{11}ì
I have tested this same QR code with the AVCaptureMetadataOutput that has been around for a while and that method decodes the QR code correctly. So my question is, what have I missed to get the output that I am getting?
(Obviously I could just use the AVCaptureMetadataOutput as a solution, because I can see that it works. But that doesn't help me learn how to use the Vision framework.)
Most likely the problem is here:
if let code = String(data: qrCode.errorCorrectedPayload, encoding: .isoLatin1)
Try to use .utf8.
Also i would suggest to look at the raw output of the 'errorCorrectedPayload' without encoding. Maybe it already has correct encoding.
The definition of errorCorrectedPayload says:
-- QR Codes are formally specified in ISO/IEC 18004:2006(E). Section 6.4.10 "Bitstream to codeword conversion" specifies the set of 8-bit codewords in the symbol immediately prior to splitting the message into blocks and applying error correction. --
This seems to work fine with VNBarcodeObservation.payloadStringValue instead of transforming VNBarcodeObservation.barcodeDescriptor.
I am trying to use CoreImage CIFeature for detecting face emotions as those are the native APIs. I have created a sample view controller project and updated related code. When I launch this iOS application, it open up the camera. When I look up the camera and show smile emotion, this below sample code detects fine.
I need to also find other emotions like, Surprise, Sad and Angry emotions. I understand that CoreImage CIFeature doesn't have direct APIs for these other emotions. But, Is it possible to try manipulating the available APIs (such as hasSmile, leftEyeClosed, rightEyeClosed etc.) to detect other emotions such as Surprise, Sad and Angry through iOS program?
Could anyone come across working with this APIs, scenario and solve this issue, please suggest and share your ideas.
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let opaqueBuffer = Unmanaged<CVImageBuffer>.passUnretained(imageBuffer!).toOpaque()
let pixelBuffer = Unmanaged<CVPixelBuffer>.fromOpaque(opaqueBuffer).takeUnretainedValue()
let sourceImage = CIImage(cvPixelBuffer: pixelBuffer, options: nil)
options = [CIDetectorSmile : true as AnyObject, CIDetectorEyeBlink: true as AnyObject, CIDetectorImageOrientation : 6 as AnyObject]
let features = self.faceDetector!.features(in: sourceImage, options: options)
for feature in features as! [CIFaceFeature] {
if (feature.hasSmile) {
DispatchQueue.main.async {
self.updateSmileEmotion()
}
}
else {
DispatchQueue.main.async {
self.resetEmotionLabel()
}
}
}
func updateSmileEmotion () {
self.emtionLabel.text = " "
self.emtionLabel.text = "HAPPY"
}
func resetEmotionLabel () {
self.emtionLabel.text = " "
}
There are a variety of libraries that can do sentiment analysis on images, and most of those rely on machine learning. Its highly unlikely that you are going to get the same kind of results by just looking at what CIFeature gives you because its pretty limited even in comparison to other facial recognition libraries. See Google Cloud Vison, IBM Watson Cloud iOS SDK, Microsoft Cognitive Services
I'm using AVCaptureSession to create a QR code scanner with AVCaptureMetadataOutput.
Everything is working as expected, however I'm wanting to put a graphical overlay on the scanner. In doing so, I'd like the scanner to only scan once the QR code is in a given section of the frame. Currently, it detects the QR code anywhere thats in the view, and I'd like it to trigger only when in the middle of the screen.
Is this even possible? For AVCapturePreviewLayer, I'm setting rectForMetadataOutputRectOfInterest but it doesn't seem to be working. Maybe I'm doing this wrong?
Some insight would be great. Thanks in advance!
I have not done this, but I believe this can be achieved. You see, when you catch the AVCaptureMetadataOutput, you can use the AVCaptureVideoPreviewLayer to get the QR code frame within the view, that's useful when you want to draw a rectangle around the captured QR.
func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection) {
guard let readableCode = metadataObjects.first as? AVMetadataMachineReadableCodeObject, let code = readableCode.stringValue else { return }
if let barcodeObject = videoPreview?.transformedMetadataObject(for: readableCode) {
qrCodeFrameView?.frame = barcodeObject.bounds
}
stopReading()
didRead?(code)
}
You could use that barcodeObject to then ask if your rect contains the barcode using CGRect's contains(_:) method
if qrReader.frame.contains(barcodeObject.bounds) {
stopReading()
didRead?(code)
}
I think that could/should work.