OPENCV Best way to handle a game screenshot - opencv

I want to make an application for counting game statistics automatically. For that purpose, I need some sort of computer vision for handling screenshots of the game.
There are bunch of regions with different skills in always the same place that app needs to recognize. I assume that it should have a database of pictures or maybe some trained samples.
I've started to learn opencv lib, but not sure what will be better for this purpouse.
Would you please give me some hints or algorithms that I could use?
Here is the example of game screenshot.

You can covert it into gray scale and then use any haar cascade classifier to read the words in that image and then save it into any file format (csv) this way you can utilize your game pics for gathering data so that you can train your models

Related

How to recognize or match two images?

I have one image stored in my bundle or in the application.
Now I want to scan images in camera and want to compare that images with my locally stored image. When image is matched I want to play one video and if user move camera from that particular image to somewhere else then I want to stop that video.
For that I have tried Wikitude sdk for iOS but it is not working properly as it is crashing anytime because of memory issues or some other reasons.
Other things came in mind that Core ML and ARKit but Core ML detect the image's properties like name, type, colors etc and I want to match the image. ARKit will not support all devices and ios and also image matching as per requirement is possible or not that I don't have idea.
If anybody have any idea to achieve this requirement they can share. every help will be appreciated. Thanks:)
Easiest way is ARKit's imageDetection. You know the limitation of devices it support. But the result it gives is wide and really easy to implement. Here is an example
Next is CoreML, which is the hardest way. You need to understand machine learning even if in brief. Then the tough part - training with your dataset. Biggest drawback is you have single image. I would discard this method.
Finally mid way solution is to use OpenCV. It might be hard but suit your need. You can find different methods of feature matching to find your image in camera feed. example here. You can use objective-c++ to code in c++ for ios.
Your task is image similarity you can do it simply and with more reliable output results using machine learning. Since your task is using camera scanning. Better option is CoreML.You can refer this link by apple for Image Similarity.You can optimize your results by training with your own datasets. Any more clarifications needed comment.
Another approach is to use a so-called "siamese network". Which really means that you use a model such as Inception-v3 or MobileNet and both images and you compare their outputs.
However, these models usually give a classification output, i.e. "this is a cat". But if you remove that classification layer from the model, it gives an output that is just a bunch of numbers that describe what sort of things are in the image but in a very abstract sense.
If these numbers for two images are very similar -- if the "distance" between them is very small -- then the two images are very similar too.
So you can take an existing Core ML model, remove the classification layer, run it twice (once on each image), which gives you two sets of numbers, and then compute the distance between these numbers. If this distance is lower than some kind of threshold, then the images are similar enough.

Making a trained model (machine learning) from 3D models

i have a database with almost 20k 3D files, they are drawings from machine parts designed in a CAD software (solid works). Im trying to build a trained model from all of this 3D models, so i can build a 3D object Recognition App when someone can take a picture from one of this parts (in the real world) and the app can provide useful information about material , size , treatment and so on.
If anyone already do something similar, any information you can provide me would be greatly appreciated!
Some ideas:
1) Several pictures: instead of only one. As Rodrigo commented and Brad Larson tried to circumvent with his method, the problem with the user taking only one picture for the input is that you are necessarily lacking information to make a triangulation and form a point cloud in 3D. With 4 pictures taken from a slightly different angle, you can already reconstruct parts of the object. Comparing point clouds would make the endeavor much easier for any ML algorithm, Neuronal Networks (NN), Support Vector Machine (SVM) or others. A common standard to create point clouds is ASTM E2807, which uses the e57 file format.
On the downside a 3D vision algorithm might be heavy on the user's device, and is not the easiest to implement.
2) Artificial picture training: By training on pre-computed artificial pictures like Brad Larson suggested, you take over much of the computation, to the user's benefit. Be aware that you should probably use "features" extracted from the pictures, not the complete picture, both to train and to classify. The problem with this method is that you might be very sensitive to lighting and background context. You should take care to produce CAD pictures that have the same lightning conditions for all objects, so that the classifier doesn't overfit certain aspects of the "pictures" that do not belong to the object.
This aspect is where solution 1) is much more stable, it is less sensitive to the visual context.
3) Scale: The size of your object is an important descriptor. You should thus add scale information to your object descriptor before training. You could ask the user to take pictures with a reference object. Alternatively you can ask the user to make a rule-of-thumb estimate of the object size ("What are the approximate dimensions of the object, in [cm]?"). Providing size could make your algorithm significantly faster and more accurate.
If your test data in production is mainly images of the 3D object, then the method in the comment section by Brad Larson is the better approach and it is also easier to implement and takes a lot less effort and resources to get it up and running.
However if you want to classify between 3D models there are existing networks which exist to classify 3D point clouds. You will have to convert these models to point clouds and use them as training samples. One of those and which I have used is Voxnet. I also suggest you to add more variations to the training data like different rotations of the 3D model.
You can used Pre-Trained 3D Deep Neural Networks as there are many networks that could help you in your work and would produce high accuracy.

Comparing images using OpenCv or something more useful

I need to compare two images in a project,
The images would be two fruits of the same kind -let's say two different images of two different apples-
To be more clear, the database will have images of the stages which an apple takes from the day it was picked from a tree until it gets rotten..
The user would upload an image of the apple they have and the software should compare it to all those images in the database and retrieve the data of the matching image and tell the user at which stage is it...
I did compare before images using OpenCv emgu but I really don't have much knowledge if it's the best way...
I need an expert advise is what i said in the project even possible? or the whole database images' will match the user's image!
And is this "image processing" or something else?
And is there any suggested tutorials to learn how to do this?
I know it seems not totally clear yet, but it's just a crazy idea that I wish I can get a way to know more how i can bring it to life!
N.B the project will be an android application
This is an example of a supervised image classification problem, which is a pretty broad field. You can read up on image classification here.
The way that you would approach this problem would be to define a few stages of decay (fresh, starting to rot, half rotten, completely rotten), put together a dataset of many images of the fruit in each stage, and train an image classifier on each stage. The sample dataset should contain images of many different pieces of fruit in many different settings. If you want to support different types of fruit, you would need to train a different classifier for each fruit.
There are many image classification tools out there. To name a few:
OpenCV's haar classifier
dlib's hog classifier
Matlab's Computer Vision System Toolbox
VLFeat
It would be up to you to look into which approach would work best for your situation.
Given that this is a fairly broad problem, I wouldn't expect to come up with a solid solution quickly unless you've had experience with image classification. If you are trying to develop a product, I would recommend getting in touch with a computer vision expert that you could contract to solve it.
If you are just looking to learn more about image classification, however, this could be a fun way to play around with different tools and get a feel for what's out there. You may want to start by learning about Machine Learning in general. Caltech offers a free online course that gives a pretty good intro to the subject.

Track shifting and zooming object between frames

I'm trying to track cars using video from dash cam. Most of the time there is
slight shifting of a vehicle in front of me
on/off brake lights
zoom in when it uses brakes
Zoom out when it accelerates.
What algorithm will be the best for this case? Of course, I can just run open cv, but I want to understand how it works.
Thank you!
I think that for your task you can use the Haar Cascade Classifier. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.
There is a good openCV's implementation, with both the trainer and the detector.
On the web you can even find a lot of .xml files, that are the result of the training part, and use these .xml files to do directly the detection.
Even if i'm not really sure that you can find these files for the detection of the back of a car.
At this link you can learn the bases of the method and you can even understand how to use it in openCV http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0
In this case you don't need the 4 features that you suggested, but maybe you can use that with another algotrithm at the end of the pipeline of the Haar Cascade Classifier for a double check.

Eigenfaces algorithm

I am programming a face recognition program using OpenCV.
When generating the eigenfaces:
do I need to use a big database of unknown faces ?
do I need to use only photos of the people I want my system to recognize ?
do I need to use both ?
I am talking about the eigenfaces generation, this is the "learning" step.
And how many photos do I need to use to have decent accuracy ? More like 20, or 2000 ?
Thanks
Eigenfaces works by projecting the faces into a particular "face basis" using principal component analysis or PCA. The basis does not have to include photos of people you want to recognize.
Instead, I would encourage you to train based upon a big database (at least 10k faces) that is well registered (eigenfaces doesn't work well with images that are shifted). The original paper by Turk and Pentland was remarkable partly due to the large pin registered face database they released. I would also say that try to have the lighting normalized to the same between the database and your test inputs.
In terms of testing, first 20 components should be sufficient to reconstruct a human recognizable face and first 100 components should be enough to discriminate between any two face for essentially arbitrarily large dataset.
You don't need too many random faces to compose a human face; somewhere close to 20 should give good results, maybe go with more if you can. They should all be lined up as much as possible to one another, front facing, and photos in grayscale under the same lighting conditions.

Resources