Does google prediction api work for image recogniton - image-processing

I read the official documentation for the api, but I wanted to make sure that it's possible for it to perform object recognition in images. More specifically, my idea is to provide a lot of images of parking lots with the number of parking spots currently available. I wanna get a model to predict how many spots are available given an image of the parking lot.
Does anybody have previous experience with using the API for a similar goal?

No i don't think google prediction api will works for image recognition.
because prediction api knows only numeric and string.
for image recognition Google Vision Api is the best , i think it cant able to recognize humans or persons but it is recognize place like eiffel tower and all.
even it can able to read the image written strings.

Related

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

How to do segmentation based on some filters(e.g. TRAFFIC SIGNALS) from live streaming data

I am supposed to do traffic symbols recognition from live streaming data. Please tell me how to automate the process of segmentation. I am able to recognize the symbols using Neural Networks from segmented data but stuck in the segmentation part.
I have tried it using YOLO, but I think I am lacking something.
I have also tried it with openCV.
please help
INPUT IMAGE FRAME FROM LIVE STREAM
OUTPUT
I would suggest you follow this link:
https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data
It's very simple to follow. You basicly need to do 2 steps. Installing and creating the data you want to use (road signs in your case).
So follow the installation guide and then try to find a dataset of road signs, use your own or create your own data set. You will need the annotation files as well (you can generate them yourself easily if you use your own dataset(s) - this is explained in the link as well). You don't need a huge amount of pictures, because darknet will augment the images automaticly (just resizing though). If you use a pretrained version you should get "ok" results pretty fast ~after 500 iterations.

Add 2D or 3D Face Filters like MSQRD/SnapChat Using Google Vision API for iOS

Here's some research I have done so far:
- I have used Google Vision API to detect various face landmarks.
Here's the reference: https://developers.google.com/vision/introduction
Here's the link to Sample Code to get the facial landmarks. It uses the same Google Vision API. Here's the reference link: https://github.com/googlesamples/ios-vision
I have gone through the various blogs on internet which says MSQRD based on the Google's cloud vision. Here's the link to it: https://medium.com/#AlexioCassani/how-to-create-a-msqrd-like-app-with-google-cloud-vision-802b578b30a0
For Android here's the reference:
https://www.raywenderlich.com/158580/augmented-reality-android-googles-face-api
There are multiple paid SDK's which full fills the purpose. But they are highly priced. So cant able to afford it.
For instance:
1) https://deepar.ai/contact/
2) https://www.luxand.com/
There is possibility might have some see this question as duplicate of this:
Face filter implementation like MSQRD/SnapChat
But the thread is almost 1.6 years old with no right answers to it.
I have gone through this article:
https://dzone.com/articles/mimic-snapchat-filters-programmatically-1
It describes all the essential steps to achieve the desired results. But they advice to use their own made SDK.
As per my research no good enough material is around which helps to full fill the desired results like MSQRD face filters.
One more Github repository around which has same implementation but it doesn't gives much information about same.
https://github.com/rootkit/LiveFaceMask
Now my question is:
If we have the facial landmarks using Google Vision API (or even using
DiLib), how I can add 2d or 3d models over it. In which format this
needs to be done like this require some X,Y coordinates with vertices
calculation.
NOTE: I have gone through the Googles "GooglyEyesDemo" which adds the
preview layer over eyes. It basically adds a view over the face. So I
dont want to add UIView one dimensional preview layers over it. Image
attached for reference :
https://developers.google.com/vision/ios/face-tracker-tutorial
Creating Models: I also want to know how to create models for live
filters like MSQRD. I welcome any software or format recommendations.
Hope the research I have done will help others and someone else
experience helps me to achieve the desired results. Let me know if any
more details are required.**
Image attached for more reference:
Thanks
Harry
Canvas class is used in android for drawing such 3D / 2D models or core graphics for IOS can be used.
What you can do is detect the face components, take their location points and draw images on top of them. Consider going through this
You need to either predict x,y,z coordinates(check out this demo), either use x,y predictions but then find parameters of universal 3d-model & camera that will give the closest projection of current x,y.

Accent detection API?

I've been doing some research on the feasibility of building a mobile/web app that allows users to say a phrase and detects the accent of the user (Boston, New York, Canadian, etc.). There will be about 5 to 10 predefined phrases that a user can say. I'm familiar with some of the Speech to Text API's that are available (Nuance, Bing, Google, etc.) but none seem to offer this additional functionality. The closest examples that I've found are Google Now or Microsoft's Speaker Recognition API:
http://www.androidauthority.com/google-now-accents-515684/
https://www.microsoft.com/cognitive-services/en-us/speaker-recognition-api
Because there are going to be 5-10 predefined phrases I'm thinking of using a machine learning software like Tensorflow or Wekinator. I'd have initial audio created in each accent to use as the initial data. Before I dig deeper into this path I just wanted to get some feedback on this approach or if there are better approaches out there. Let me know if I need to clarify anything.
There is no public API for such a rare task.
Accent detection as language detection is commonly implemented with i-vectors. Tutorial is here. Implementation is available in Kaldi.
You need significant amount of data to train the system even if your sentences are fixed. It might be easier to collect accented speech without focusing on the specific sentences you have.
End-to-end tensorflow implementation is also possible but would probably require too much data since you need to separate speaker-instrinic things from accent-instrinic things (basically perform the factorization like i-vector is doing). You can find descriptions of similar works like this and this one.
You could use(this is just an idea, you will need to experiment a lot) a neural network with as many outputs as possible accents you have with a softmax output layer and cross entropy cost function

Sketch-based Image Retrieval with OpenCV or LIRe

I'm currently reading for BSc Creative Computing with the University of London and I'm in my last year of my studies. The only remaining module I have left in order to complete the degree is the Project.
I'm very interested in the area of content-based image retrieval and my project idea is based on that concept. In a nutshell, my idea is to help novice artists in drawing sketches in perspective with the use of 3D models as references. I intend to achieve this by rendering the side/top/front views of each 3D model in a collection, pre-process these images and index them. While drawing, the user gets a series of models (that have been pre-processed) that best match his/her sketch, which can be used as guidelines to further enhance the sketch. Since this approach relies on 3D models, it is also possible for the user to rotate the sketch in 3D space and continue drawing based on that perspective. Such approach could help comic artists or concept designers in quickly sketching their ideas.
While carrying out my research I came across LIRe and I must say I was really impressed. I've downloaded the LIRe demo v0.9 and I played around with the included sample. I've also developed a small application which automatically downloades, indexes and searches for similar images in order to better understand the inner workings of the engine. Both approaches returned very good results even with a limited set of images (~300).
Next experiment was to test the output response when a sketch rather than an actual image is provided as input. As mentioned earlier, the system should be able to provide a set of matching models based on the user's sketch. This can be achieved by matching the sketch with the rendered images (which are of course then linked to the 3D model). I've tried this approach by comparing several sketches to a small set of images and the results were quite good - see http://claytoncurmi.net/wordpress/?p=17. However when I tried with a different set of images, results weren't as good as the previous scenario. I used the Bag of Visual Words (using SURF) technique provided by LIRe to create and search through the index.
I'm also trying out some sample code that comes with OpenCV (I've never used this library and I'm still finding my way).
So, my questions are;
1..Has anyone tried implementing a sketch-based image retrieval system? If so, how did you go about it?
2..Can LIRe/OpenCV be used for sketch-based image retrieval? If so, how this can be done?
PS. I've read several papers about this subject, however I didn't find any documentation about the actual implementation of such system.
Any help and/or feedback is greatly appreciated.
Regards,
Clayton

Resources