CoreML Output labels NSCFString - Labels not showing correctly - ios

I am working on an iOS app where I need to use a CoreML model to perform image classification.
I used Google Cloud Platform AutoML Vision to train the model. Google provides a CoreML version of the model and I downloaded it to use in my app.
I followed Google's tutorial and everything appeared to be going smoothly. However when it use time to start using the model and got very strange prediction. I got the confidence of the prediction and then I got a very strange string that I didn't know what it was.
<VNClassificationObservation: 0x600002091d40> A7DBD70C-541C-4112-84A4-C6B4ED2EB7E2 requestRevision=1 confidence=0.332127 "CICAgICAwPmveRIJQWdsYWlzX2lv"
The string I am referring to is CICAgICAwPmveRIJQWdsYWlzX2lv.
After some research and debugging I found out that this is a NSCFString.
https://developer.apple.com/documentation/foundation/1395135-nsclassfromstring
Apparently this is part of the foundation API. Does anyone has any experience with this?
With the CoreML file also comes a dict.txt file with the correct labels. Do I have to convert this string to the labels? How do I do that.
This the code I have so far.
//
// Classification.swift
// Lepidoptera
//
// Created by Tomás Mamede on 15/09/2020.
// Copyright © 2020 Tomás Santiago. All rights reserved.
//
import Foundation
import SwiftUI
import Vision
import CoreML
import ImageIO
class Classification {
private lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: AutoML().model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
if let classifications = request.results as? [VNClassificationObservation] {
print(classifications.first ?? "No classification!")
}
})
request.imageCropAndScaleOption = .scaleFit
return request
}
catch {
fatalError("Error! Can't use Model.")
}
}()
func classifyImage(receivedImage: UIImage) {
let orientation = CGImagePropertyOrientation(rawValue: UInt32(receivedImage.imageOrientation.rawValue))
if let image = CIImage(image: receivedImage) {
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: image, orientation: orientation!)
do {
try handler.perform([self.classificationRequest])
}
catch {
fatalError("Error classifying image!")
}
}
}
}
}

The labels are stored in your mlmodel file. If you open the mlmodel in the Xcode 12 model viewer, it will display what those labels are.
My guess is that instead of actual labels, your mlmodel file contains "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on.
It looks like Google's AutoML does not put the correct class labels into the Core ML model.
You can make a dictionary in the app that maps "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on to the real labels.
Or you can replace these labels inside the mlmodel file by editing it using coremltools. (My e-book Core ML Survival Guide has a chapter on how to replace the labels in the model.)

Related

Unable to detect QRCode from image using MLKit

I am using MLKIt for detect QRCode from image. for andrid it is working proper, for ios I am using below pods
pod 'GoogleMLKit/BarcodeScanning'
Here is sample code detect QRcode from image which picked from gallery. every time features array comes empty.
let format: BarcodeFormat = BarcodeFormat.all
let barcodeOptions = BarcodeScannerOptions(formats: format)
let visionImage = VisionImage(image: image)
visionImage.orientation = image.imageOrientation
let barcodeScanner = BarcodeScanner.barcodeScanner(options: barcodeOptions)
barcodeScanner.process(visionImage) { features, error in
guard error == nil, let features = features, !features.isEmpty else {
// Error handling
return
}
// Recognized barcodes
print("Data :: \(features.first?.rawValue ?? "")")
}
We noticed this may happen when there are no padding around the QR code, I also tried to add some padding to it: and it works after that. Could you confirm that it works?
On the other side, ML Kit is also working on a public document on this limitation. Thanks for reporting this.
Julie from ML Kit team

Why my pre-trained mlmodel is so wrong in object recognition?

recently I wanted to check out CoreML and CreateML so I created simple app with object recognition.
I created model only for bananas and carrots (just for a try).I used over 60 images to trained my model and in Create ML app the training process looked fine.
Everything was going great until I printed out the results in the console and I saw that my model is 100% confident that waterfall is a banana ...
Ideally, I thought the output would be 0% confidence for banana and 0% confidence for carrots (because I used image of waterfall).
Could You explain me why the output look like this and give any kind of advice how to improve my app ?
This is my code for image recognition :
func recognizeObject (image: CIImage) {
guard let myModel = try? VNCoreMLModel(for: FruitVegeClassifier_1().model) else {
fatalError("Couldn't load ML Model")
}
let recognizeRequest = VNCoreMLRequest(model: myModel) { (recognizeRequest, error) in
guard let output = recognizeRequest.results as? [VNClassificationObservation] else {
fatalError("Your model failed !")
}
print(output)
}
let handler = VNImageRequestHandler(ciImage: image)
do {
try handler.perform([recognizeRequest])
} catch {
print(error)
}
}
in the console we can see that :
[<VNClassificationObservation: 0x600001c77810> 24503983-5770-4F43-8078-F3F6243F47B2 requestRevision=1 confidence=1.000000 "banana", <VNClassificationObservation: 0x600001c77840> E73BFBAE-D6E1-4D31-A2AE-0B3C860EAF99 requestRevision=1 confidence=0.000000 "carrot"]
and the image looks like this :
Thanks for any help !
If you only trained on images of bananas and carrots, the model should only be used on images of bananas and carrots.
When you give it a totally different kind of images, it will try to match it to the patterns it has learned, which are either bananas or carrots and nothing else.
In other words, these models do not work they way you were expecting them to.

Swift: Process UIImage data for use in Firebase custom TFLite model

I am using Swift, Firebase, and Tensorflow to build an image recognition model. I have a re-trained MobileNet model that takes an input array of [1,224,224,3] copied into my Xcode bundle, and when I try to add data from an image as an input, I get the error: Input 0 should have 602112 bytes, but found 627941 bytes. I am using the following code:
let input = ModelInputs()
do {
let newImage = image.resizeTo(size: CGSize(width: 224, height: 224))
let data = UIImagePNGRepresentation(newImage)
// Store input data in `data`
// ...
try input.addInput(data)
// Repeat as necessary for each input index
} catch let error as NSError {
print("Failed to add input: \(error.localizedDescription)")
}
interpreter.run(inputs: input, options: ioOptions) { outputs, error in
guard error == nil, let outputs = outputs else {
print(error!.localizedDescription)//ERROR BEING CALLED HERE
return }
// Process outputs
print(outputs)
// ...
}
How can I re-process the image data to be 602112 bytes? I am so confused if someone could please help me it would be great :)
Please check out the Quick Start iOS demo app in Swift on how to use a custom TFLite model:
https://github.com/firebase/quickstart-ios/tree/master/mlmodelinterpreter
In particular, I think this is what you are looking for:
https://github.com/firebase/quickstart-ios/blob/master/mlmodelinterpreter/MLModelInterpreterExample/UIImage%2BTFLite.swift#L47
Good luck!

Why is the Vision framework unable to align two images?

I'm trying to take two images using the camera, and align them using the iOS Vision framework:
func align(firstImage: CIImage, secondImage: CIImage) {
let request = VNTranslationalImageRegistrationRequest(
targetedCIImage: firstImage) {
request, error in
if error != nil {
fatalError()
}
let observation = request.results!.first
as! VNImageTranslationAlignmentObservation
secondImage = secondImage.transformed(
by: observation.alignmentTransform)
let compositedImage = firstImage!.applyingFilter(
"CIAdditionCompositing",
parameters: ["inputBackgroundImage": secondImage])
// Save the compositedImage to the photo library.
}
try! visionHandler.perform([request], on: secondImage)
}
let visionHandler = VNSequenceRequestHandler()
But this produces grossly mis-aligned images:
You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.
I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?
I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.
My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.
This is on iPhone X.
How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?
You should set secondImage as targetImage, and perform handler with firstImage.
I use your composite way.
check out this example from MLBoy:
let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])
let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}
guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform
image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing", parameters: ["inputBackgroundImage": image2])

How to get object rect/coordinates from VNClassificationObservation

have an issue with getting from VNClassificationObservation.
My goal id to recognize the object and display popup with the object name, I'm able to get name but I can't get object coordinates or frame.
Here is code:
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
try handler.perform([classificationRequest, detectFaceRequest])
} catch {
print(error)
}
Then I handle
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation] else {
fatalError("unexpected result type from VNCoreMLRequest")
}
// Filter observation
let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })
// Update UI
DispatchQueue.main.async { [weak self] in
for observation in filteredOservations {
print("observation: ",observation.identifier)
//HERE: I need to display popup with observation name
}
}
}
UPDATED:
lazy var classificationRequest: VNCoreMLRequest = {
// Load the ML model through its generated class and create a Vision request for it.
do {
let model = try VNCoreMLModel(for: Inceptionv3().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
return request
} catch {
fatalError("can't load Vision ML model: \(error)")
}
}()
A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.
When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.
If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.
How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.
Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO
That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.
What model are you using here?
For tracking and identifying objects, you’ll have to create your own model using Darknet. I have struggled the same problem, and used TuriCreate to train model, and instead of just providing images to the framework, you’ll have also to provide ones with bounding boxes. Apple have documented here, how to create those models:
Apple TuriCreate docs

Resources