Swift: Process UIImage data for use in Firebase custom TFLite model - ios

I am using Swift, Firebase, and Tensorflow to build an image recognition model. I have a re-trained MobileNet model that takes an input array of [1,224,224,3] copied into my Xcode bundle, and when I try to add data from an image as an input, I get the error: Input 0 should have 602112 bytes, but found 627941 bytes. I am using the following code:
let input = ModelInputs()
do {
let newImage = image.resizeTo(size: CGSize(width: 224, height: 224))
let data = UIImagePNGRepresentation(newImage)
// Store input data in `data`
// ...
try input.addInput(data)
// Repeat as necessary for each input index
} catch let error as NSError {
print("Failed to add input: \(error.localizedDescription)")
}
interpreter.run(inputs: input, options: ioOptions) { outputs, error in
guard error == nil, let outputs = outputs else {
print(error!.localizedDescription)//ERROR BEING CALLED HERE
return }
// Process outputs
print(outputs)
// ...
}
How can I re-process the image data to be 602112 bytes? I am so confused if someone could please help me it would be great :)

Please check out the Quick Start iOS demo app in Swift on how to use a custom TFLite model:
https://github.com/firebase/quickstart-ios/tree/master/mlmodelinterpreter
In particular, I think this is what you are looking for:
https://github.com/firebase/quickstart-ios/blob/master/mlmodelinterpreter/MLModelInterpreterExample/UIImage%2BTFLite.swift#L47
Good luck!

Related

Unable to detect QRCode from image using MLKit

I am using MLKIt for detect QRCode from image. for andrid it is working proper, for ios I am using below pods
pod 'GoogleMLKit/BarcodeScanning'
Here is sample code detect QRcode from image which picked from gallery. every time features array comes empty.
let format: BarcodeFormat = BarcodeFormat.all
let barcodeOptions = BarcodeScannerOptions(formats: format)
let visionImage = VisionImage(image: image)
visionImage.orientation = image.imageOrientation
let barcodeScanner = BarcodeScanner.barcodeScanner(options: barcodeOptions)
barcodeScanner.process(visionImage) { features, error in
guard error == nil, let features = features, !features.isEmpty else {
// Error handling
return
}
// Recognized barcodes
print("Data :: \(features.first?.rawValue ?? "")")
}
We noticed this may happen when there are no padding around the QR code, I also tried to add some padding to it: and it works after that. Could you confirm that it works?
On the other side, ML Kit is also working on a public document on this limitation. Thanks for reporting this.
Julie from ML Kit team

CoreML Output labels NSCFString - Labels not showing correctly

I am working on an iOS app where I need to use a CoreML model to perform image classification.
I used Google Cloud Platform AutoML Vision to train the model. Google provides a CoreML version of the model and I downloaded it to use in my app.
I followed Google's tutorial and everything appeared to be going smoothly. However when it use time to start using the model and got very strange prediction. I got the confidence of the prediction and then I got a very strange string that I didn't know what it was.
<VNClassificationObservation: 0x600002091d40> A7DBD70C-541C-4112-84A4-C6B4ED2EB7E2 requestRevision=1 confidence=0.332127 "CICAgICAwPmveRIJQWdsYWlzX2lv"
The string I am referring to is CICAgICAwPmveRIJQWdsYWlzX2lv.
After some research and debugging I found out that this is a NSCFString.
https://developer.apple.com/documentation/foundation/1395135-nsclassfromstring
Apparently this is part of the foundation API. Does anyone has any experience with this?
With the CoreML file also comes a dict.txt file with the correct labels. Do I have to convert this string to the labels? How do I do that.
This the code I have so far.
//
// Classification.swift
// Lepidoptera
//
// Created by Tomás Mamede on 15/09/2020.
// Copyright © 2020 Tomás Santiago. All rights reserved.
//
import Foundation
import SwiftUI
import Vision
import CoreML
import ImageIO
class Classification {
private lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: AutoML().model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
if let classifications = request.results as? [VNClassificationObservation] {
print(classifications.first ?? "No classification!")
}
})
request.imageCropAndScaleOption = .scaleFit
return request
}
catch {
fatalError("Error! Can't use Model.")
}
}()
func classifyImage(receivedImage: UIImage) {
let orientation = CGImagePropertyOrientation(rawValue: UInt32(receivedImage.imageOrientation.rawValue))
if let image = CIImage(image: receivedImage) {
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: image, orientation: orientation!)
do {
try handler.perform([self.classificationRequest])
}
catch {
fatalError("Error classifying image!")
}
}
}
}
}
The labels are stored in your mlmodel file. If you open the mlmodel in the Xcode 12 model viewer, it will display what those labels are.
My guess is that instead of actual labels, your mlmodel file contains "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on.
It looks like Google's AutoML does not put the correct class labels into the Core ML model.
You can make a dictionary in the app that maps "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on to the real labels.
Or you can replace these labels inside the mlmodel file by editing it using coremltools. (My e-book Core ML Survival Guide has a chapter on how to replace the labels in the model.)

Why my pre-trained mlmodel is so wrong in object recognition?

recently I wanted to check out CoreML and CreateML so I created simple app with object recognition.
I created model only for bananas and carrots (just for a try).I used over 60 images to trained my model and in Create ML app the training process looked fine.
Everything was going great until I printed out the results in the console and I saw that my model is 100% confident that waterfall is a banana ...
Ideally, I thought the output would be 0% confidence for banana and 0% confidence for carrots (because I used image of waterfall).
Could You explain me why the output look like this and give any kind of advice how to improve my app ?
This is my code for image recognition :
func recognizeObject (image: CIImage) {
guard let myModel = try? VNCoreMLModel(for: FruitVegeClassifier_1().model) else {
fatalError("Couldn't load ML Model")
}
let recognizeRequest = VNCoreMLRequest(model: myModel) { (recognizeRequest, error) in
guard let output = recognizeRequest.results as? [VNClassificationObservation] else {
fatalError("Your model failed !")
}
print(output)
}
let handler = VNImageRequestHandler(ciImage: image)
do {
try handler.perform([recognizeRequest])
} catch {
print(error)
}
}
in the console we can see that :
[<VNClassificationObservation: 0x600001c77810> 24503983-5770-4F43-8078-F3F6243F47B2 requestRevision=1 confidence=1.000000 "banana", <VNClassificationObservation: 0x600001c77840> E73BFBAE-D6E1-4D31-A2AE-0B3C860EAF99 requestRevision=1 confidence=0.000000 "carrot"]
and the image looks like this :
Thanks for any help !
If you only trained on images of bananas and carrots, the model should only be used on images of bananas and carrots.
When you give it a totally different kind of images, it will try to match it to the patterns it has learned, which are either bananas or carrots and nothing else.
In other words, these models do not work they way you were expecting them to.

How to get specific information of image using Firebase-CloudVision(ML)

I am using Firebase cloudVision (ML) API to read image.
I am able to the get the information of an image back but it is not specific.
Example: when I take and upload a picture of MacBook it is giving the output as "notebook,Loptop,electronic device..etc".
But I want to get its brand name like Apple MacBook ,
I have seen few apps doing this .
I could not find any information regarding this, so here I am posting.
Please suggest or guide if anyone come across this
My Code:
func pickedImage(image: UIImage) {
imageView.image = image
imageView.contentMode = .scaleAspectFit
guard let image = imageView.image else { return }
// let onCloudLabeler =
Vision.vision().cloudImageLabeler(options: options)
let onCloudLabeler = Vision.vision().cloudImageLabeler()
// Define the metadata for the image.
let imageMetadata = VisionImageMetadata()
imageMetadata.orientation = .topLeft
// Initialize a VisionImage object with the given UIImage.
let visionImage = VisionImage(image: image)
visionImage.metadata = imageMetadata
onCloudLabeler.process(visionImage) { labels, error in
guard error == nil, let labels = labels, !labels.isEmpty
else {
// [START_EXCLUDE]
let errorString = error?.localizedDescription ?? "No results returned."
print("Label detection failed with error: \(errorString)")
//self.showResults()
// [END_EXCLUDE]
return
}
// [START_EXCLUDE]
var results = [String]()
let resultsText = labels.map { label -> String in
results.append(label.text)
return "Label: \(label.text), " +
"Confidence: \(label.confidence ?? 0), " +
"EntityID: \(label.entityID ?? "")"
}.joined(separator: "\n")
//self.showResults()
// [END_EXCLUDE]
print(results.count)
print(resultsText)
self.labelTxt.text = results.joined(separator: ",")
results.removeAll()
}
}
If you've seen other apps doing something that your app doesn't do, those other apps are likely using a different ML model than the one you're using.
If you want to accomplish the same using ML Kit for Firebase, you can use a custom model that you either trained yourself or got from another source.
As Puf said, the apps you saw are probably using their own custom ML model. ML Kit now supports creating custom image classification models from your own training data. Check out the AutoML Vision Edge functionality here: https://firebase.google.com/docs/ml-kit/automl-vision-edge

How to get object rect/coordinates from VNClassificationObservation

have an issue with getting from VNClassificationObservation.
My goal id to recognize the object and display popup with the object name, I'm able to get name but I can't get object coordinates or frame.
Here is code:
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
try handler.perform([classificationRequest, detectFaceRequest])
} catch {
print(error)
}
Then I handle
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation] else {
fatalError("unexpected result type from VNCoreMLRequest")
}
// Filter observation
let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })
// Update UI
DispatchQueue.main.async { [weak self] in
for observation in filteredOservations {
print("observation: ",observation.identifier)
//HERE: I need to display popup with observation name
}
}
}
UPDATED:
lazy var classificationRequest: VNCoreMLRequest = {
// Load the ML model through its generated class and create a Vision request for it.
do {
let model = try VNCoreMLModel(for: Inceptionv3().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
return request
} catch {
fatalError("can't load Vision ML model: \(error)")
}
}()
A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.
When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.
If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.
How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.
Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO
That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.
What model are you using here?
For tracking and identifying objects, you’ll have to create your own model using Darknet. I have struggled the same problem, and used TuriCreate to train model, and instead of just providing images to the framework, you’ll have also to provide ones with bounding boxes. Apple have documented here, how to create those models:
Apple TuriCreate docs

Resources