I am creating an app which detect the exercises. i trained the model using create ML. i got 100% result in create ML application. But when i am integrating into the application using Vision framework it's always showing only one exercise. i followed the code exactly from Build an Action Classifier with Create ML for creating ml and requesting VNHumanBodyPoseObservation. Followed this for converting VNHumanBodyPoseObservation to MLMultiArray.
Here is the code what i do:
func didOutput(pixelBuffer: CVPixelBuffer) {
self.extractPoses(pixelBuffer)
}
func extractPoses(_ pixelBuffer: CVPixelBuffer) {
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
let request = VNDetectHumanBodyPoseRequest { (request, err) in
if err == nil {
if let observations =
request.results as? [VNRecognizedPointsObservation], observations.count > 0 {
if let prediction = try? self.makePrediction(observations) {
print("\(prediction.label), confidence: \(prediction.confidence)")
}
}
}
}
do {
// Perform the body pose-detection request.
try handler.perform([request])
} catch {
print("Unable to perform the request: \(error).\n")
}
}
func makePrediction(_ observations: [VNRecognizedPointsObservation]) throws -> (label: String, confidence: Double) {
let fitnessClassifier = try PlayerExcercise(configuration: MLModelConfiguration())
let numAvailableFrames = observations.count
let observationsNeeded = 60
var multiArrayBuffer = [MLMultiArray]()
for frameIndex in 0 ..< min(numAvailableFrames, observationsNeeded) {
let pose = observations[frameIndex]
do {
let oneFrameMultiArray = try pose.keypointsMultiArray()
multiArrayBuffer.append(oneFrameMultiArray)
} catch {
continue
}
}
// If poseWindow does not have enough frames (45) yet, we need to pad 0s
if numAvailableFrames < observationsNeeded {
for _ in 0 ..< (observationsNeeded - numAvailableFrames) {
do {
let oneFrameMultiArray = try MLMultiArray(shape: [1, 3, 18], dataType: .double)
try resetMultiArray(oneFrameMultiArray)
multiArrayBuffer.append(oneFrameMultiArray)
} catch {
continue
}
}
}
let modelInput = MLMultiArray(concatenating: [MLMultiArray](multiArrayBuffer), axis: 0, dataType: .float)
//
//
let predictions = try fitnessClassifier.prediction(poses: modelInput)
return (label: predictions.label, confidence: predictions.labelProbabilities[predictions.label]!)
}
func resetMultiArray(_ predictionWindow: MLMultiArray, with value: Double = 0.0) throws {
let pointer = try UnsafeMutableBufferPointer<Double>(predictionWindow)
pointer.initialize(repeating: value)
}
I suspect the issue happening while converting VNRecognizedPointsObservation to MLMultiArray Please help me, i am trying to achieve this so hard. Thanks in advance.
Are you running your app on a simulator? Because I had the same issue that the model predicted wrong results when I ran my image classifier app on a iPhone 12 simulator. But the issue was solved when I tried to run the app on a real device. So maybe there is nothing wrong with your model or code, try running it on a real device and see if you get your intended results.
Related
Overview
I have 2 cycling workouts (recorded using the Workouts app on an Apple Watch SE) that I'm trying to retrieve the location data (GPX samples) from. They were part of the same continuous ride that I recorded using 2 separate workouts due to the original workout being unable to unpause midway through. Ultimately, I'd like to obtain all location samples and merge them into a single file.
Current State
The health app on my iPhone (SE, 2020) correctly shows 2 workouts on the day I recorded them (July 16th), and also contains 4 workout route objects for the 2 workouts: 1 that makes up the entirety of the first workout, and 3 that combine to make up the second workout.
Screenshot showing 2 workouts
Screenshot showing 4 workout routes
When I export the raw health data however, only 3 workout routes show up for July 16th. And they're all for the second workout. No GPX route file is exported for the first workout.
Attempt to Solve
In an attempt to try and access the raw data from HealthKit directly, I found an app on Github, built it, and loaded it onto my iPhone to see what I could get. The app retrieves a list of workouts, and exports a GPX file with all location samples when a workout is selected. This works great with the second of my cycling workouts, but crashes when trying to export the data from the first workout.
Here's the function that actually queries the workout for route information:
public func route(for workout: HKWorkout, completion: #escaping (([CLLocation]?, Error?) -> Swift.Void)) {
let routeType = HKSeriesType.workoutRoute();
let p = HKQuery.predicateForObjects(from: workout)
let sortDescriptor = NSSortDescriptor(key: HKSampleSortIdentifierStartDate, ascending: true)
let q = HKSampleQuery(sampleType: routeType, predicate: p, limit: HKObjectQueryNoLimit, sortDescriptors: [sortDescriptor]) {
(query, samples, error) in
if let err = error {
print(err)
return
}
guard let routeSamples: [HKWorkoutRoute] = samples as? [HKWorkoutRoute] else { print("No route samples"); return }
if (routeSamples.count == 0){
completion([CLLocation](), nil)
return;
}
var sampleCounter = 0
var routeLocations:[CLLocation] = []
for routeSample: HKWorkoutRoute in routeSamples {
let locationQuery: HKWorkoutRouteQuery = HKWorkoutRouteQuery(route: routeSample) { _, locationResults, done, error in
guard locationResults != nil else {
print("Error occured while querying for locations: \(error?.localizedDescription ?? "")")
DispatchQueue.main.async {
completion(nil, error)
}
return
}
if done {
sampleCounter += 1
if sampleCounter != routeSamples.count {
if let locations = locationResults {
routeLocations.append(contentsOf: locations)
}
} else {
if let locations = locationResults {
routeLocations.append(contentsOf: locations)
let sortedLocations = routeLocations.sorted(by: {$0.timestamp < $1.timestamp})
DispatchQueue.main.async {
completion(sortedLocations, error)
}
}
}
} else {
if let locations = locationResults {
routeLocations.append(contentsOf: locations)
}
}
}
self.healthStore.execute(locationQuery)
}
}
healthStore.execute(q)
}
After further debugging, it appears that the HKSampleQuery call is returning 0 samples when querying the first cycling workout. It returns 3 samples (each of the workout route objects) when querying the second cycling workout.
So it's as if the HealthKit API can't see the 4th workout route, even though it clearly exists as evidenced by the screenshot from the health app.
Question
What's going on here? Why does the second workout return data just fine but the first doesn't?
Is it possible to query the health database directly to obtain this data? I have a local encrypted iPhone backup with a known passcode I can access as well.
This cycling ride was pretty important to me, so I'm trying all I can to retrieve the location data. Thanks in advance!
Edit: Adding main code where the route() function gets called.
override func tableView(_ tableView: UITableView, didSelectRowAt indexPath: IndexPath) {
print(indexPath);
guard let workouts = self.workouts else {
return;
}
if (indexPath.row >= workouts.count){
return;
}
print(indexPath.row)
let workout = workouts[indexPath.row];
let workout_name: String = {
switch workout.workoutActivityType {
case .cycling: return "Cycle"
case .running: return "Run"
case .walking: return "Walk"
default: return "Workout"
}
}()
let workout_title = "\(workout_name) - \(self.dateFormatter.string(from: workout.startDate))"
let file_name = "\(self.filenameDateFormatter.string(from: workout.startDate)) - \(workout_name)"
let targetURL = URL(fileURLWithPath: NSTemporaryDirectory())
.appendingPathComponent(file_name)
.appendingPathExtension("gpx")
let file: FileHandle
do {
let manager = FileManager.default;
if manager.fileExists(atPath: targetURL.path){
try manager.removeItem(atPath: targetURL.path)
}
print(manager.createFile(atPath: targetURL.path, contents: Data()))
file = try FileHandle(forWritingTo: targetURL);
} catch let err {
print(err)
return
}
workoutStore.heartRate(for: workouts[indexPath.row]){
(rates, error) in
guard let keyedRates = rates, error == nil else {
print(error as Any);
return
}
let iso_formatter = ISO8601DateFormatter()
var current_heart_rate_index = 0;
var current_hr: Double = -1;
let bpm_unit = HKUnit(from: "count/min")
var hr_string = "";
file.write(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><gpx version=\"1.1\" creator=\"Apple Workouts (via pilif's hack of the week)\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://www.topografix.com/GPX/1/1\" xsi:schemaLocation=\"http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd\" xmlns:gpxtpx=\"http://www.garmin.com/xmlschemas/TrackPointExtension/v1\"><trk><name><![CDATA[\(workout_title)]]></name><time>\(iso_formatter.string(from: workout.startDate))</time><trkseg>"
.data(using: .utf8)!
)
self.workoutStore.route(for: workouts[indexPath.row]){
(maybe_locations, error) in
guard let locations = maybe_locations, error == nil else {
print(error as Any);
file.closeFile()
return
}
for location in locations {
while (current_heart_rate_index < keyedRates.count) && (location.timestamp > keyedRates[current_heart_rate_index].startDate) {
current_hr = keyedRates[current_heart_rate_index].quantity.doubleValue(for: bpm_unit)
current_heart_rate_index += 1;
hr_string = "<extensions><gpxtpx:TrackPointExtension><gpxtpx:hr>\(current_hr)</gpxtpx:hr></gpxtpx:TrackPointExtension></extensions>"
}
file.write(
"<trkpt lat=\"\(location.coordinate.latitude)\" lon=\"\(location.coordinate.longitude)\"><ele>\(location.altitude.magnitude)</ele><time>\(iso_formatter.string(from: location.timestamp))</time>\(hr_string)</trkpt>"
.data(using: .utf8)!
)
}
file.write("</trkseg></trk></gpx>".data(using: .utf8)!)
file.closeFile()
let activityViewController = UIActivityViewController( activityItems: [targetURL],
applicationActivities: nil)
if let popoverPresentationController = activityViewController.popoverPresentationController {
popoverPresentationController.barButtonItem = nil
}
self.present(activityViewController, animated: true, completion: nil)
}
}
}
I'm working on a watchOS App as my first Swift/iOS project ever. I want to fetch the latest body weight sample and use it for some calculation. The result is presented to the user. As soon as a new sample is added, I want to update my UI as well. It works in a completely fresh simulator installation. As soon as I add a sample in the iOS simulator, the app updates its UI in the watchOS simulator. However, it doesn't work on my real device or after resetting the watchOS simulator. And I just don't know why. The HKAnchoredObjectQuery just returns 0 samples but I definitely have some samples stored in health. I can even see them under Settings > Health on my watch. I can't imagine this is related to my code, but here it is:
class WeightProvider: ObservableObject {
private static let weightSampleType = HKSampleType.quantityType(forIdentifier: .bodyMass)!
private static let healthStore: HKHealthStore = .init()
private var previousAnchor: HKQueryAnchor?
private var runningQuery: HKAnchoredObjectQuery?
#Published var bodyWeight: Measurement<UnitMass>?
func getBodyWeight(longRunning: Bool = false) {
let query = HKAnchoredObjectQuery(type: Self.weightSampleType, predicate: nil, anchor: previousAnchor, limit: longRunning ? HKObjectQueryNoLimit : 1, resultsHandler: processQueryResult)
if longRunning {
query.updateHandler = processQueryResult
runningQuery = query
}
Self.healthStore.execute(query)
}
func stopLongRunningQuery() {
if let runningQuery = runningQuery {
Self.healthStore.stop(runningQuery)
self.runningQuery = nil
}
}
private func processQueryResult(_: HKAnchoredObjectQuery, samples: [HKSample]?, _: [HKDeletedObject]?, newAnchor: HKQueryAnchor?, error: Error?) {
guard let samples = samples as? [HKQuantitySample], error == nil else {
fatalError(error?.localizedDescription ?? "Failed to cast [HKSample] to [HKQuantitySample]")
}
previousAnchor = newAnchor
guard let sample = samples.last else {
return
}
DispatchQueue.main.async {
if Locale.current.usesMetricSystem {
let weight = sample.quantity.doubleValue(for: .gramUnit(with: .kilo))
self.bodyWeight = .init(value: weight, unit: UnitMass.kilograms)
} else {
let weight = sample.quantity.doubleValue(for: .pound())
self.bodyWeight = .init(value: weight, unit: UnitMass.pounds)
}
}
}
}
// MARK: - HealthKit Authorization
extension WeightProvider {
private static let typesToRead: Set<HKObjectType> = [
weightSampleType,
]
func authorize(completion: #escaping (Bool, Error?) -> Swift.Void) {
Self.healthStore.requestAuthorization(toShare: nil, read: Self.typesToRead) { success, error in
completion(success, error)
}
}
}
In my Views onAppear I call this function:
private func authorizeHealthKit() {
guard firstRun else {
return
}
firstRun = false
weightProvider.authorize { success, error in
guard success, error == nil else {
return
}
weightProvider.getBodyWeight(longRunning: true)
}
}
HealthKit is properly authorized as I can see in the Settings of my Watch. Any ideas? Any tips for my code in general?
Wow, after all this time I found the issue: The line previousAnchor = newAnchor needs to be after the guard statement. That's it.
I trained a CNN classification model using RGB images as input and it produces 1x7 output with probabilities of class labels(7 different classes). I have converted the model from keras .h5 to coreML. I have seen different applications and tried both of them with and without class labels defined. They did not cause any issue while converting. However none of them work in IOS. Both models crash when I call below line:
guard let result = predictionRequest.results as? [VNCoreMLFeatureValueObservation] else {
fatalError("model failed to process image")
}
Output definition of my both models are below. Could you please advice what is wrong with the model output. Do I have to add class labels or not? I am confused how to call the highest probable value. I have added entire classification code too. Please see below. Since I am a beginner in IOS, your help is greatly appreciated. Thanks a lot indeed.
Model output definition in IOS with class labels conversion:
/// Identity as dictionary of strings to doubles
lazy var Identity: [String : Double] = {
[unowned self] in return self.provider.featureValue(for: "Identity")!.dictionaryValue as! [String : Double]
}()
/// classLabel as string value
lazy var classLabel: String = {
[unowned self] in return self.provider.featureValue(for: "classLabel")!.stringValue
}()
Model output definition in IOS without class labels conversion:
init(Identity: MLMultiArray) {
self.provider = try! MLDictionaryFeatureProvider(dictionary: ["Identity" : MLFeatureValue(multiArray: Identity)])
}
Classification Code:
class ColorStyleVisionManager: NSObject {
static let shared = ColorStyleVisionManager()
static let MODEL = hair_color_class_labels().model
var colorStyle = String()
var hairColorFlag: Int = 0
private lazy var predictionRequest: VNCoreMLRequest = {
do{
let model = try VNCoreMLModel(for: ColorStyleVisionManager.MODEL)
let request = VNCoreMLRequest(model: model)
request.imageCropAndScaleOption = VNImageCropAndScaleOption.centerCrop
return request
} catch {
fatalError("can't load Vision ML Model")
}
}()
func predict(image:CIImage) -> String {
guard let result = predictionRequest.results as? [VNCoreMLFeatureValueObservation] else {
fatalError("model failed to process image")
}
let firstResult = result.first
if firstResult?.featureName == "0" {
colorStyle = "Plain Coloring"
hairColorFlag = 1
}
else if firstResult?.featureName == "1" {
colorStyle = "Ombre"
hairColorFlag = 2
}
else if firstResult?.featureName == "2" {
colorStyle = "Sombre"
hairColorFlag = 2
}
else if firstResult?.featureName == "3" {
colorStyle = "HighLight"
hairColorFlag = 3
}
else if firstResult?.featureName == "4" {
colorStyle = "LowLight"
hairColorFlag = 3
}
else if firstResult?.featureName == "5" {
colorStyle = "Color Melt"
hairColorFlag = 5
}
else if firstResult?.featureName == "6" {
colorStyle = "Dip Dye"
hairColorFlag = 4
}
else {}
let handler = VNImageRequestHandler(ciImage: image)
do {
try handler.perform([predictionRequest])
} catch {
print("error handler")
}
return colorStyle
}
}
I have found out two different problems in my code. In order to ensure that my model correctly converted to mlmodel, I created a new classification mlmodel by using Apple's CreateML tool. By the way it is fantastic even though the accuracy seems lower than my original model. I compared the output and input types of the model and seems my mlmodel is correct too. Then I used this model and gave it another try. It crashed again. I wasn't so sure what prediction result I have to expect whether "VNClassificationObservation" or "VNCoreMLFeatureValueObservation". I changed to classificationobservation. It crashed again. Then I realized that my handler definition was below the crash line and I moved it to upper portion. Then woola. It worked. I double checked by changing the FeatureValueObservation and it crashed again. So two problems are solved. Please see the correct code below.
I strongly recommend to use CreateML tool to confirm your model conversion work fine for debugging purposes. It is just a few minutes job.
class ColorStyleVisionManager: NSObject {
static let shared = ColorStyleVisionManager()
static let MODEL = hair_color_class_labels().model
var colorStyle = String()
var hairColorFlag: Int = 0
private lazy var predictionRequest: VNCoreMLRequest = {
do{
let model = try VNCoreMLModel(for: ColorStyleVisionManager.MODEL)
let request = VNCoreMLRequest(model: model)
request.imageCropAndScaleOption = VNImageCropAndScaleOption.centerCrop
return request
} catch {
fatalError("can't load Vision ML Model")
}
}()
func predict(image:CIImage) -> String {
let handler = VNImageRequestHandler(ciImage: image)
do {
try handler.perform([predictionRequest])
} catch {
print("error handler")
}
guard let result = predictionRequest.results as? [VNClassificationObservation] else {
fatalError("error to process request")
}
let firstResult = result.first
print(firstResult!)
We created a mlmodel with playground like https://developer.apple.com/documentation/createml/creating_an_image_classifier_model.
Then we used following code to get bounding box data of objects in that mlmodel. But in "results" we can get just prediction values and object names we modeled, even that was exiting but not our aim.
print("detectOurModelHandler (results)") Shows us the all the objects and
prediction values in our mlmodel and it is VNClassificationObservation.
So it is no surprise that we do not have box data.
So the problem is how to create model as VNRecognizedObjectObservation, I think ?
According to https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture we are supposed to get bounding box data.
But we can not. Even print("detectOurModelHandler 2") is never called like dump(objectBounds).
We call findOurModels in captureOutput by the way. We call it like once in 1 second to test our model at the moment.
lazy var ourModel:VNCoreMLModel = { return try! VNCoreMLModel(for: ImageClassifier().model)}()
lazy var ourModelRequest: VNCoreMLRequest = {
return VNCoreMLRequest(model: ourModel, completionHandler: detectOutModelHandler)
}()
func findOurModels(pixelbuffer: CVPixelBuffer){
let testImage = takeAFrameImage(imageBuffer: pixelbuffer)
let imageForThis = testImage.cgImage
let requestOptions2:[VNImageOption : Any] = [:]
let handler = VNImageRequestHandler(cgImage: imageForThis!,
orientation: CGImagePropertyOrientation(rawValue: 6)!,
options: requestOptions2)
try? handler.perform([ourModelRequest])
}
func detectOurModelHandler(request: VNRequest, error: Error?) {
DispatchQueue.main.async(execute: {
if let results = request.results {
print("detectOurModelHandler \(results)")
for observation in results where observation is VNRecognizedObjectObservation {
print("detectOurModelHandler 2")
guard let objectObservation = observation as? VNRecognizedObjectObservation else {
continue
}
let objectBounds = VNImageRectForNormalizedRect(objectObservation.boundingBox, self.frameWidth, self.frameHeight)
dump(objectBounds)
}
}
})
}
It can not be done using CreateML.
I did not do it yet but it is said a model with bounding data could be created with Turi Create.
I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages:
1) class VNDetectTextRectanglesRequest
2) class VNTextObservation
It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger?
Here's a post that is a brief overview of Vision.
Thank you for reading.
This is how to do it ...
//
// ViewController.swift
//
import UIKit
import Vision
import CoreML
class ViewController: UIViewController {
//HOLDS OUR INPUT
var inputImage:CIImage?
//RESULT FROM OVERALL RECOGNITION
var recognizedWords:[String] = [String]()
//RESULT FROM RECOGNITION
var recognizedRegion:String = String()
//OCR-REQUEST
lazy var ocrRequest: VNCoreMLRequest = {
do {
//THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)
let model = try VNCoreMLModel(for:OCR().model)
return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
} catch {
fatalError("cannot load model")
}
}()
//OCR-HANDLER
func handleClassification(request: VNRequest, error: Error?)
{
guard let observations = request.results as? [VNClassificationObservation]
else {fatalError("unexpected result") }
guard let best = observations.first
else { fatalError("cant get best result")}
self.recognizedRegion = self.recognizedRegion.appending(best.identifier)
}
//TEXT-DETECTION-REQUEST
lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {
return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)
}()
//TEXT-DETECTION-HANDLER
func handleDetection(request:VNRequest, error: Error?)
{
guard let observations = request.results as? [VNTextObservation]
else {fatalError("unexpected result") }
// EMPTY THE RESULTS
self.recognizedWords = [String]()
//NEEDED BECAUSE OF DIFFERENT SCALES
let transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y: (self.inputImage?.extent.size.height)!)
//A REGION IS LIKE A "WORD"
for region:VNTextObservation in observations
{
guard let boxesIn = region.characterBoxes else {
continue
}
//EMPTY THE RESULT FOR REGION
self.recognizedRegion = ""
//A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)
for box in boxesIn
{
//SCALE THE BOUNDING BOX TO PIXELS
let realBoundingBox = box.boundingBox.applying(transform)
//TO BE SURE
guard (inputImage?.extent.contains(realBoundingBox))!
else { print("invalid detected rectangle"); return}
//SCALE THE POINTS TO PIXELS
let topleft = box.topLeft.applying(transform)
let topright = box.topRight.applying(transform)
let bottomleft = box.bottomLeft.applying(transform)
let bottomright = box.bottomRight.applying(transform)
//LET'S CROP AND RECTIFY
let charImage = inputImage?
.cropped(to: realBoundingBox)
.applyingFilter("CIPerspectiveCorrection", parameters: [
"inputTopLeft" : CIVector(cgPoint: topleft),
"inputTopRight" : CIVector(cgPoint: topright),
"inputBottomLeft" : CIVector(cgPoint: bottomleft),
"inputBottomRight" : CIVector(cgPoint: bottomright)
])
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])
//SOME OPTIONS (TO PLAY WITH..)
self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill
//FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!
do {
try handler.perform([self.ocrRequest])
} catch { print("Error")}
}
//APPEND RECOGNIZED CHARS FOR THAT REGION
self.recognizedWords.append(recognizedRegion)
}
//THATS WHAT WE WANT - PRINT WORDS TO CONSOLE
DispatchQueue.main.async {
self.PrintWords(words: self.recognizedWords)
}
}
func PrintWords(words:[String])
{
// VOILA'
print(recognizedWords)
}
func doOCR(ciImage:CIImage)
{
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])
//WE NEED A BOX FOR EACH DETECTED CHARACTER
self.textDetectionRequest.reportCharacterBoxes = true
self.textDetectionRequest.preferBackgroundProcessing = false
//FEED IT TO THE QUEUE FOR TEXT-DETECTION
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([self.textDetectionRequest])
} catch {
print ("Error")
}
}
}
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
//LETS LOAD AN IMAGE FROM RESOURCE
let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too
//WE NEED A CIIMAGE - NOT NEEDED TO SCALE
inputImage = CIImage(image:loadedImage)!
//LET'S DO IT
self.doOCR(ciImage: inputImage!)
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
}
You'll find the complete project here included is the trained model !
SwiftOCR
I just got SwiftOCR to work with small sets of text.
https://github.com/garnele007/SwiftOCR
uses
https://github.com/Swift-AI/Swift-AI
which uses NeuralNet-MNIST model for text recognition.
TODO : VNTextObservation > SwiftOCR
Will post example of it using VNTextObservation once I have it one connected to the other.
OpenCV + Tesseract OCR
I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.
SEE ALSO : Google Vision iOS
Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.
https://developers.google.com/vision/text-overview
//Correction: just tried it but only Android version of the sdk supports text detection.
https://developers.google.com/vision/text-overview
If you subscribe to releases:
https://libraries.io/cocoapods/GoogleMobileVision
Click SUBSCRIBE TO RELEASES
you can see when TextDetection is added to the iOS part of the Cocoapod
Apple finally updated Vision to do OCR. Open a playground and dump a couple of test images in the Resources folder. In my case, I called them "demoDocument.jpg" and "demoLicensePlate.jpg".
The new class is called VNRecognizeTextRequest. Dump this in a playground and give it a whirl:
import Vision
enum DemoImage: String {
case document = "demoDocument"
case licensePlate = "demoLicensePlate"
}
class OCRReader {
func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel) {
guard let url = url else { return }
let requestHandler = VNImageRequestHandler(url: url, options: [:])
let request = VNRecognizeTextRequest { (request, error) in
if let error = error {
print(error)
return
}
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for currentObservation in observations {
let topCandidate = currentObservation.topCandidates(1)
if let recognizedText = topCandidate.first {
print(recognizedText.string)
}
}
}
request.recognitionLevel = recognitionLevel
try? requestHandler.perform([request])
}
}
func url(for image: DemoImage) -> URL? {
return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")
}
let ocrReader = OCRReader()
ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)
There's an in-depth discussion of this from WWDC19
Adding my own progress on this, if anyone have a better solution:
I've successfully drawn the region box and character boxes on screen. The vision API of Apple is actually very performant. You have to transform each frame of your video to an image and feed it to the recogniser. It's much more accurate than feeding directly the pixel buffer from the camera.
if #available(iOS 11.0, *) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}
var requestOptions:[VNImageOption : Any] = [:]
if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
requestOptions = [.cameraIntrinsics:camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: 6,
options: requestOptions)
let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in
guard let observations = request.results else {print("no result"); return}
let result = observations.map({$0 as? VNTextObservation})
DispatchQueue.main.async {
self.previewLayer.sublayers?.removeSubrange(1...)
for region in result {
guard let rg = region else {continue}
self.drawRegionBox(box: rg)
if let boxes = region?.characterBoxes {
for characterBox in boxes {
self.drawTextBox(box: characterBox)
}
}
}
}
})
request.reportCharacterBoxes = true
try? imageRequestHandler.perform([request])
}
}
Now I'm trying to actually reconize the text. Apple doesn't provide any built in OCR model. And I want to use CoreML to do that, so I'm trying to convert a Tesseract trained data model to CoreML.
You can find Tesseract models here: https://github.com/tesseract-ocr/tessdata and I think the next step is to write a coremltools converter that support those type of input and output a .coreML file.
Or, you can link to TesseractiOS directly and try to feed it with your region boxes and character boxes you get from the Vision API.
Thanks to a GitHub user, you can test an example: https://gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8
- (void)detectWithImageURL:(NSURL *)URL
{
VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:#{}];
VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
if (error) {
NSLog(#"%#", error);
}
else {
for (VNTextObservation *textObservation in request.results) {
// NSLog(#"%#", textObservation);
// NSLog(#"%#", textObservation.characterBoxes);
NSLog(#"%#", NSStringFromCGRect(textObservation.boundingBox));
for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) {
NSLog(#" |-%#", NSStringFromCGRect(rectangleObservation.boundingBox));
}
}
}
}];
request.reportCharacterBoxes = YES;
NSError *error;
[handler performRequests:#[request] error:&error];
if (error) {
NSLog(#"%#", error);
}
}
The thing is, the result is an array of bounding boxes for each detected character. From what I gathered from Vision's session, I think you are supposed to use CoreML to detect the actual chars.
Recommended WWDC 2017 talk: Vision Framework: Building on Core ML (haven't finished watching it either), have a look at 25:50 for a similar example called MNISTVision
Here's another nifty app demonstrating the use of Keras (Tensorflow) for the training of a MNIST model for handwriting recognition using CoreML: Github
I'm using Google's Tesseract OCR engine to convert the images into actual strings. You'll have to add it to your Xcode project using cocoapods. Although Tesseract will perform OCR even if you simply feed the image containing texts to it, the way to make it perform better/faster is to use the detected text rectangles to feed pieces of the image that actually contain text, which is where Apple's Vision Framework comes in handy.
Here's a link to the engine:
Tesseract OCR
And here's a link to the current stage of my project that has text detection + OCR already implemented:
Out Loud - Camera to Speech
Hope these can be of some use. Good luck!
For those still looking for a solution I wrote a quick library to do this. It uses both the Vision API and Tesseract and can be used to achieve the task the question describes with one single method:
func sliceaAndOCR(image: UIImage, charWhitelist: String, charBlackList: String = "", completion: #escaping ((_: String, _: UIImage) -> Void))
This method will look for text in your image, return the string found and a slice of the original image showing where the text was found
Firebase ML Kit does it for iOS (and Android) with their on-device Vision API and it outperforms Tesseract and SwiftOCR.