I've been playing with Apple's CoreML and Vision APIs.
My goal would be to make a simple proof of concept and be able to recognize nails on a hand picture. This is very specific.
I have been trying to find documentation on how to create your own VNRequest, and I really have no idea on how to do this.
I know that the Vision API offers rectangle, face and text recognition only...
How can I make my own request to teach Vision how to recognize what I want on a picture ?
You will have to create (or find) a Core ML model that can do this. There is at least one open source model that can detect nails, so you'd have to convert this to Core ML. And then you use VNCoreMLRequest to run this model using Vision.
Related
I and my partner decided to implement a traffic light recognition program as a student project.
But we are absolute beginners with computer vision and have no idea how to start with this. (What only we know is to use OpenCV)
Should we firstly learn image recognition or just start with object tracking?
Our ideal production is to recognize traffic light in a video but not just an image.
In my opinion, you should take a serious course about computer vision before going deeper.
The video is just a sequence of picture. So you could use opencv to read each image then process them.
For you current project, a simple object detection using hog feature should be more than enough.
There's tutorial at http://www.hackevolve.com/create-your-own-object-detector/ . It's very easy to understand and source code is also available, so you can move quick.
Good luck.
Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.
I use VNImageRequestHandler and VNDetectRectanglesRequest to handle request to find rectangles in a image. But since Vision in iOS11 only provide barcode、rectangle、face finding,but I want to find cars in an image ,what should I change code to find specify object in an image?
If you’re looking for Apple to create an API named VNDetectCarRequest you should probably file a feature request. (And if it happens, I’m sure the “Apple is making a car!” rumor mill will start up again...)
For general-purpose image recognition, the path to take with Vision is to use VNCoreMLRequest and supply a machine learning model trained for the image recognition task you have in mind.
On the native programming side, all image recognition/classification tasks are the same — you can start by reusing Apple’s Classifying Images with Vision and Core ML sample code, which sets up VNCoreMLRequest and handles the VNClassificationObservation results it produces. The special sauce that changes a general “what is this” classifier into a “hotdog or not a hotdog” classifier or a “what kind of vehicle is this (if it’s one at all)” classifier is all in the model.
There might be a machine learning model that already does the task you’re looking for out there — if you find one, you can wrap it in a Core ML Model file using the scripts Apple provides.
Otherwise, you’ll need to look at one of the general purpose image classifier models out there (again, there are several already conveniently gathered on developer.apple.com) and work on specializing / retraining it to your more specific task. That part of your work is outside Apple’s API ecosystem, and there are many possible options. Web searches for “train caffe image model” or “train keras image model” or similar should be helpful there.
Once you’ve trained your model, use the Core ML tools to get it into Core ML to use with Vision.
I read the official documentation for the api, but I wanted to make sure that it's possible for it to perform object recognition in images. More specifically, my idea is to provide a lot of images of parking lots with the number of parking spots currently available. I wanna get a model to predict how many spots are available given an image of the parking lot.
Does anybody have previous experience with using the API for a similar goal?
No i don't think google prediction api will works for image recognition.
because prediction api knows only numeric and string.
for image recognition Google Vision Api is the best , i think it cant able to recognize humans or persons but it is recognize place like eiffel tower and all.
even it can able to read the image written strings.
Currently I'm trying to develop a program which can skeleton track
From the research I've been doing, I found out that the best way to tackle this problem is by using an RGB-Depth Camera such as Kinect.
Challenge : MS Kinect does not support skeleton tracking therefore I need to build a custom skeleton tracking
First Problem : How to Detect with a RGB-Depth camera?
What I found : Use a machine learning algorithm
Question: Is machine learning the only option to detect ? do I need the depth information for detection?
You can use the depth channel for the purpose of segmentation of the target silhouette, and then extract descriptive features and classify with them.
I'm not sure if what you need is to classify a target as human or animal, or whether you need to find what type of animal. If you need the former, features such as aspect ratio are very simple and good separators. If you need to classify which animal, it depends on the list of classes. Some cases are easier and some are harder.