I'm trying to figure out the easiest way to run object detection from a Tensorflow model (Inception or mobilenet) in an iOS app.
I have iOS Tensorflow image classification working in my own app and network following this example
and have Tensorflow image classification and object detection working in Android for my own app and network following this example
but the iOS example does not contain object detection, only image classification, so how to extend the iOS example code to support object detection, or is there a complete example for this in iOS? (preferably objective-C)
I did find this and this, but it recompiles Tensorflow from source, which seems complex,
also found Tensorflow lite,
but again no object detection.
I also found an option of converting Tensorflow model to Apple Core ML, using Core ML, but this seems very complex, and could not find a complete example for object detection in Core ML
You need to train your own ML model. For iOS it will be easier to just use Core ML. Also tensorflow models can be exported in Core ML format. You can play with this sample and try different models. https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture
Or here:
https://github.com/ytakzk/CoreML-samples
So I ended up following this demo project,
https://github.com/csharpseattle/tensorflowiOS
It provided a working demo app/project, and was easy to switch its Tensorflow pb file for my own trained network file.
The instructions in the readme are pretty straight forward.
You do need to checkout and recompile Tensorflow, which takes several hours and 10gb of space. I did have the thread issue, used the gsed instructions, which worked. You also need to install Homebrew.
I have not looked at Core ML yet, but from what I have read converting from Tensorflow to Core ML is complicated, and you may loose parts of your model.
It ran quite fast on iPhone, even using an Inception model instead of Mobilenet.
Related
I have trained a YoloV4 CNN. It's pretty good already. I want more images as training data but there is no point of manually annotate most of the stuff because CNN can do it for me. I could review and re-correct if there are any issues. Is there a image annotation tool/service that can do that? I'm currently using Supervisely. I also tried CVAT and VoTT Couldn't find such feature.
I created a tool python project to generate supervisely project using darknet. It's available on github.
https://github.com/s1n7ax/partially-annotate
I have just trained a model with satisfactory results and I have the frozen_inference_graph.pb . How would I go about running this on iOS? It was trained on SSD Mobilenet V1 if that helps. Optimally I'd like to run it using the GPU (I know the tensorflow API can't do that on iOS), but it would be great to just have it on CPU first.
Support was just announced for importing TensorFlow models into Core ML. This is accomplished using the tfcoreml converter, which should take in your .pb graph and output a Core ML model. From there, you can use this model with Core ML and either take in still images or video frames for processing.
At that point, it's up to you to make sure you're providing the correct input colorspace and size, then extracting and processing the SSD results correctly to get your object classes and bounding boxes.
I use VNImageRequestHandler and VNDetectRectanglesRequest to handle request to find rectangles in a image. But since Vision in iOS11 only provide barcode、rectangle、face finding,but I want to find cars in an image ,what should I change code to find specify object in an image?
If you’re looking for Apple to create an API named VNDetectCarRequest you should probably file a feature request. (And if it happens, I’m sure the “Apple is making a car!” rumor mill will start up again...)
For general-purpose image recognition, the path to take with Vision is to use VNCoreMLRequest and supply a machine learning model trained for the image recognition task you have in mind.
On the native programming side, all image recognition/classification tasks are the same — you can start by reusing Apple’s Classifying Images with Vision and Core ML sample code, which sets up VNCoreMLRequest and handles the VNClassificationObservation results it produces. The special sauce that changes a general “what is this” classifier into a “hotdog or not a hotdog” classifier or a “what kind of vehicle is this (if it’s one at all)” classifier is all in the model.
There might be a machine learning model that already does the task you’re looking for out there — if you find one, you can wrap it in a Core ML Model file using the scripts Apple provides.
Otherwise, you’ll need to look at one of the general purpose image classifier models out there (again, there are several already conveniently gathered on developer.apple.com) and work on specializing / retraining it to your more specific task. That part of your work is outside Apple’s API ecosystem, and there are many possible options. Web searches for “train caffe image model” or “train keras image model” or similar should be helpful there.
Once you’ve trained your model, use the Core ML tools to get it into Core ML to use with Vision.
For some time, I have been using OpenCV. It satisfied all my needs of feature extraction, matching and clustering(k-means till now) and classification(SVM). Recently, I came across Apache Mahout. But, most of the algorithms for machine learning are already available in OpenCV as well. Are there any advantages of using Mahout over OpenCV if the work relates to Videos and Images ?
This question might be put on hold since it is opinion based. I still want to add a basic comparison.
OpenCV is capable of anything about vision and ml that is possibly researched, or invented. The vision literature is based on it, and it develops according to the literature. Even the newborn ml algorithms -like TLD, originated on MATLAB- (http://www.tldvision.com/) can also be implemented using OpenCV (http://gnebehay.github.io/OpenTLD/) with some effort.
Mahout is capable, too and specific to ml. It includes not only the well known ml algorithms, but also the specific ones. Say you came across to a paper "Processing Apples with K-means Orientation Filtering". You can find OpenCV implementations of this paper all around the web. Even the actual algorithm might be open source and developed using OpenCV. With OpenCV, say it takes 500 lines of code, but with Mahout, the paper might be already implemented with a single method making everything easier
An example about this case is http://en.wikipedia.org/wiki/Canopy_clustering_algorithm, which is harder to implement using OpenCV right now.
Since you are going to work with image data sets you will need to learn about HIPI, too.
To sum up, here is a simple pro-con table:
know-how (learning curve): OpenCV is easier, since you already know about it. Mahout+HIPI will take more time.
examples: Literature + vision community commonly use OpenCV. Open source algorithms are mostly created with C++ api of OpenCV.
ml algorithms: Mahout is only about ml, whereas OpenCV is more generic. Still OpenCV has access to basic ml algorithms.
development: Mahout is easier to work with in terms of coding and time complexity (I am not sure about the latter, but I reckon it is).
I am struggling to create a custom haar classifier. I have found a couple tutorials on the web, but they do not specify which version of opencv they are using. What I need is a very concise and simplified example of the steps that are required, along with a simple dataset of images. I also need to know the opencv version and the OS platform so I can get it running. I have tried a matrix of opencv versions on both windows and linux and I have run into memory error after memory error. I would like to start with a known good set of data and simple commands before expanding it to fit my problem.
Thanks for your help,
Chris
OpenCV provides two utility commands createsamples.exe and haartraining.exe, which can generate xml files used by Haar Classifiers. That is, with the xml file outputted from haartraining.exe, you can directly use the face detection sample with your xml file to detect any customized objects.
About the detailed procedures to use the commands, you may consult Page 513-516 in the book "Learning OpenCV", or this tutorial.
About the internal mechanism of how the classifier works, you may consult the paper "Rapid Object Detection using a Boosted Cascade of Simple
Features", which has been cited 5500+ times.