MediaPipe vs MLKit Vision vs ARCore - arcore

There seems to be a lot of overlap between these 3 Google libraries.
According to their sites:
MediaPipe: MediaPipe offers cross-platform, customizable ML solutions for live and streaming media.
ARCore: With ARCore, build new augmented reality experiences that seamlessly blend the digital and physical worlds.
MLKit Vision: Video and image analysis APIs to label images and detect barcodes, text, faces, and objects.
Could someone with experience working with these explain how they relate to eachother and what are the use cases for each?
For example which would be appropriate to implement high level, popular features such as face filters?
(Also perhaps some insight on which of the 3 is most likely to land in Google Graveyard the fastest)

Some simplified & informal explanations:
MediaPipe is a powerful but lower-level library for live and streaming ML solutions, which requires non-trivial setup and customization before it works for your use case.
ML Kit is an end-to-end solution provider, offering mobile friendly, easy-to-use APIs and pre-built pipelines under the hood. Several ML Kit features are actually powered by MediaPipe internally (i.e. Pose detection and Selfie-segmentation).
There is no direct relationships between ARCore and ML Kit, but there could be shared or similar ML models in between, because both require ML models to power their features but the two products have different focuses.

Related

Is there a way to get the functionality of ML Kit in the web browser?

I'm developing a cross-platform app (iOS/Android/web) and am loving the fast, cheap on-device image labeling feature of ML Kit on mobile. Is there a way to replicate the behavior on the web? Are the ML Kit models available for re-use with a different ML library so it can be repurposed?
Unfortunately, it does not seem like ML Kit allows you to export models created using it, only import models. However, tensorflow.js lets you run TensorFlow models on the web. If you are looking for an easy way to create models there are several web-based programs which allow you to easily create ML models and export as TensorFlow Lite (which can be run in tensorflow.js or even hosted on Firebase). A couple I have heard of are: lobe.ai and ml5.js. Hope this helps.

Is there a way to do real hand detection when using ARCore?

I want to do some real hand detection when using ARCore in order to expand some functions. Unfortunately ARCore doesn't support it.
So until May 2019, is there a way to do object-detection when using ARCore?
I have trained a model by TensorFlow, but it seems it cannot work together with ARCore.

GMM adaptation to new data

I have been using the GMM cluster package by Bouman, for which I did not find any adaptation module online. Before I start off reading up on the GMM adaptation theory and implementing it, I did like to know if there are any other opensource GMM projects online which does all of training, testing and adaptation to new data.?
It might be late to answer this now but for future reference, I suggest the Bob library (specifically bob.bio.gmm), which provides a wide range of functionalities to manipulate Guassian mixture models for speech related applications including MAP adaptation and UBM generation.

API availability to track other objects apart from human gesture for Windows Kinect

APIs shipped with MS Windows Kinect SDK is all about program around Voice, Movement and Gesture Recognition related to humans.
Is there any open source or commercial APIs for tracking & recognizing dynamically moving objects like vehicles for its classification.
Is it feasible and good approach of employee Kinect for Automated vehicle classification than traditional image processing approaches
Even image processing technologies have made remarkable innovations, why fully automated vehicle classification is not used at Most of the toll collection.
why existing technologies (except RFID approach) failing to classify the vehicle (i.e, they are not yet 100% accurate in classifying) or is there any other reasons apart from image processing.
You will need to use a regular image processing suite to track objects that are not supported by the Kinect API. A few being:
OpenCV
Emgu CV (OpenCV in .NET)
ImageMagick
There is no library that directly supports the depth capabilities of the Kinect, to my knowledge. As a result, using the Kinect over a regular camera would be of no benefit.

OpenCV vs Mahout for Computer Vision based Machine Learning?

For some time, I have been using OpenCV. It satisfied all my needs of feature extraction, matching and clustering(k-means till now) and classification(SVM). Recently, I came across Apache Mahout. But, most of the algorithms for machine learning are already available in OpenCV as well. Are there any advantages of using Mahout over OpenCV if the work relates to Videos and Images ?
This question might be put on hold since it is opinion based. I still want to add a basic comparison.
OpenCV is capable of anything about vision and ml that is possibly researched, or invented. The vision literature is based on it, and it develops according to the literature. Even the newborn ml algorithms -like TLD, originated on MATLAB- (http://www.tldvision.com/) can also be implemented using OpenCV (http://gnebehay.github.io/OpenTLD/) with some effort.
Mahout is capable, too and specific to ml. It includes not only the well known ml algorithms, but also the specific ones. Say you came across to a paper "Processing Apples with K-means Orientation Filtering". You can find OpenCV implementations of this paper all around the web. Even the actual algorithm might be open source and developed using OpenCV. With OpenCV, say it takes 500 lines of code, but with Mahout, the paper might be already implemented with a single method making everything easier
An example about this case is http://en.wikipedia.org/wiki/Canopy_clustering_algorithm, which is harder to implement using OpenCV right now.
Since you are going to work with image data sets you will need to learn about HIPI, too.
To sum up, here is a simple pro-con table:
know-how (learning curve): OpenCV is easier, since you already know about it. Mahout+HIPI will take more time.
examples: Literature + vision community commonly use OpenCV. Open source algorithms are mostly created with C++ api of OpenCV.
ml algorithms: Mahout is only about ml, whereas OpenCV is more generic. Still OpenCV has access to basic ml algorithms.
development: Mahout is easier to work with in terms of coding and time complexity (I am not sure about the latter, but I reckon it is).

Resources