I am working on creating a Real-time image processor for a self driving small scale car project for uni, It uses a raspberry pi to get various information to send to the program to base a decision by.
the only stage i have left is to create a Neural network which will view the image displayed from the camera ( i already have to code to send the array of CV_32F values between 0-255 etc.
I have been scouring the internet and cannot seem to find any example code that is related to my specific issue or my kind of task in general (how to implement a neural network of this kind), so my question is is it possible to create a NN of this size in c++ without hard coding it (aka utilising openCv's capabilities): it will need 400 input nodes for each value (from 20x20 image) and produce 4 outputs of left right fwd or backwards respectively.
How would one create a neural network in opencv?
Does openCV provide a backpropogation(training) interface /function or would I have to write this myself.
once it is trained am I correct in assuming I can load the neural network using ANN_MLP load etc? following this pass the live stream frame (as an array of values) to it and it should be able to produce the correct output.
edit:: I have found this OpenCV image recognition - setting up ANN MLP. and It is very simple in comparison to what I want to do, and I am not Sure how to adapt that to my problem.
OpenCV is not a neural network framework and in turn won't find any advanced features. It's far more common to use a dedicated ANN library and combine it with OpenCV. Caffe is a great choice as a computer vision dedicated deep learning framework (with C++ API), and it can be combined with OpenCV.
Related
last time I saw library rllib: https://docs.ray.io/en/latest/rllib/index.html.
It has amazing features for reinforcement learning, but unfortunately, I couldn't find a way to input images as an observation without flattening them (I basically want to use convolutional neural network). Is there any way to input image observations in models using rllib library?
Rllib is compatible with openai's gym, you can create a custom env https://docs.ray.io/en/latest/rllib/rllib-env.html#configuring-environments and return a Box as an observation space like https://stackoverflow.com/a/69602365/4994352
I am using OpenCV library and I can detect multiple faces in a video file or using a webcam. Now, I want to recognize those faces.
If any one guide me step by step means what should I do after detecting faces,it will be great for me. I am using C and C++ language.
#KISHAN, You may follow a tutorial with an example of using OpenFace deep learning network. It takes a 96x96 image of human's face and returns 128-dimensional unit vector called embedding vector. You may match two persons by dot product of these embeddings. So this neural network maps faces to multidimensional unit sphere where similar faces are mapped to the closer points.
NOTE: there is a live demo which downloads models (~35MB) if you pressed a Start button.
In the application I am developing, I have about 5000 product label images.(One label per product).
One functionality of my application is that user can take a picture using his camera and get a possible match(es) against the product labels registered the system.
Since initially, my system only has one sample per product, I decided to go with traditional Computer Vision techniques. I managed to implement this using Feature extraction and Descriptor matching.(using OpenCV SIFT and FLANN techniques referring this: https://github.com/kipr/opencv/blob/master/samples/cpp/matching_to_many_images.cpp)
Now I am thinking how to improve the accuracy by combining with CNN or Deep Learning techniques since when users approve matches, it gradually add more label samples for a product.
Is it possible to build a hybrid image matching system combining Computer Vision techniques and CNN/Deep Learning techniques?
Are there any similar services already available as services?
You should learn more about Distance Metrics Learning (DML). There is a lot of information on the internet, but briefly:
You must get embeddings (vector representation) for each image from your base (e.g. get feature vector from last convolutional layer of one of the modern CNN's (Inception, VGG, ResNet, DenseNet))
Then, when you get new image, you should create vector representation of the current image and find the closest vector from your base (by Euclidean distance, for example)
This topic is quite complicated, so study it carefully :)
Have a luck!
I have a set of reference images (200) and a set of photos of those images (tens of thousands). I have to classify each photo in a semi-automated way. Which algorithm and open source library would you advise me to use for this task? The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
To give a little more context, the reference images are branded packages, and the photos are of the same packages, but with all kinds of noises: reflections from the flash, low light, imperfect perspective, etc. The photos are already (manually) segmented: only the package is visible.
Back in my days with image recognition (like 15 years ago) I would have probably tried to train a neural network with the reference images, but I wonder if now there are better ways to do this.
I recommend that you use Python, and use the NumPy/SciPy libraries for your numerical work. Some helpful libraries for handling images are the Mahotas library and the scikits.image library.
In addition, you will want to use scikits.learn, which is a Python wrapper for Libsvm, a very standard SVM implementation.
The hard part is choosing your descriptor. The descriptor will be the feature you compute from each image, intended to compute a similarity distance with the set of reference images. A good set of things to try would be Histogram of Oriented Gradients, SIFT features, and color histograms, and play around with various ways of binning the different parts of the image and concatenating such descriptors together.
Next, set aside some of your data for training. For these data, you have to manually label them according to the true reference image they belong to. You can feed these labels into built-in functions in scikits.learn and it can train a multiclass SVM to recognize your images.
After that, you may want to look at MPI4Py, an implementation of MPI in Python, to take advantage of multiprocessors when doing the large descriptor computation and classification of the tens of thousands of remaining images.
The task you describe is very difficult and solving it with high accuracy could easily lead to a research-level publication in the field of computer vision. I hope I've given you some starting points: searching any of the above concepts on Google will hit on useful research papers and more details about how to use the various libraries.
The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
One way people do this is with the so-called "Earth mover's distance". Briefly, one imagines each pixel in an image as a stack of rocks with height corresponding to the pixel value and defines the distance between two images as the minimal amount of work needed to transfer one arrangement of rocks into the other.
Algorithms for this are a current research topic. Here's some matlab for one: http://www.cs.huji.ac.il/~ofirpele/FastEMD/code/ . Looks like they have a java version as well. Here's a link to the original paper and C code: http://ai.stanford.edu/~rubner/emd/default.htm
Try Radpiminer (one of the most widely used data-mining platform, http://rapid-i.com) with IMMI (Image Mining Extension, http://www.burgsys.com/mumi-image-mining-community.php), AGPL licence.
It currently implements several similarity measurement methods (not only trivial pixel by pixel comparison). The similarity measures can be input for a learning algorithm (e.g. neural network, KNN, SVM, ...) and it can be trained in order to give better performance. Some information bout the methods is given in this paper:
http://splab.cz/wp-content/uploads/2012/07/artery_detection.pdf
Now-a-days Deep Learning based framworks like Torch , Tensorflow, Theano, Keras are the best open source tool/library for object classification/recognition tasks.
I have to make an application which recognize road signs. I saw that in OpenCV folder there are some XML files for facial recognition but I do not know what that numbers in the XML represents or how they obtained those values. I need to understand this so as I can do my own XML files for road sign recognition.
I do not know much about OpenCV, anyhow I have completed my Final Year Project on Face Recognition using neural networks. Basically I used an algorithm to extract the Facial Portion from a given image. Thereafter I fed that new image (containing only the face) to a neural network that I developed using Matlab. After rigorous improvements, it was a success and by using the Simulation Feature of Matlab it was possible to precisely identify the individual.
Therefore I strongly recommend that you follow the same technique in carrying out this task.
I managed to find some interesting articles related to this topic, here, here , here and here.
What you need is two steps:
detection step
recognition step
for the detection, I suggest you to use cascade classifier that is included with opencv. It's robust and more quick than that of haar trainer. By this step you train the traffic signs to be detected. I found this tutorial that may help you how to prepare your training stuff
by this step you detect your signs . it may detect you some additional false objects in the image, for these undesired objects you can eliminate them by some processing like ratio, or color , or even by adding some negative images.
for the recognition I suggest you to use exactly the opencv's tutorial dedicated for face recognition
here you don't need a lot of modification..