How do I confirm if two images contain the same object? - machine-learning

I need to be able to determine if two images contain the same object. A perfect example would be two photos of a licence plate at different angles.
I've been thinking about OCR (Optical Character Recognition), which would probably get the job done, but I would really like to capitalize on things other than just text (the oil smudge at the top right corner of the licence plate, the dent at the bottom, etc...). This led me to feature matching algorithms like SIFT (Scale Invariant Feature Transform).
I also know that the license plates will always use the same font for the characters printed on it (and have ny states symbol on it) , so maybe some machine learning to train it to that specific character set is in the cards as well? I'm looking for any and all means to reduce mismatching.
In summation, is there a vendor out there that sells an sdk I can incorporate or some opensource code out there that has the following:
OCR
Feature Matching
Training Component
Appreciate the help!

There is some great samples that help to you learn about license plate recognition methods like emgu library :
http://www.emgu.com/wiki/index.php/License_Plate_Recognition_in_CSharp
Emgu is open source image processing and machine learning library.
Thanks.

Related

Best approach to build a model to recognise licence plates (ALPR)

I am trying to make a deep learning model to detect and read number plates using deep learning techniques like CNN. I would be making a model in tensorflow. But i still don't know what can be the best approach to build such model.
i have checked few models like this
https://matthewearl.github.io/2016/05/06/cnn-anpr/
i have also checked some research papers but none show the exact way.
So the steps what i am planning to follow are
Image preprocessing using opencv ( grayscale,transformations etc i dont know much about this part)
Licence plate Detection (probably by sliding window method)
Train using CNN by building a synthetic dataset as in the above link.
My questions
Is there any better way to do this?
Can RNN also be combined after CNN for variable length number?
Should i prefer detecting and recognising individual characters rather the whole plate?
There are many old methods too who prefer image preprocessing and the directly passing to OCR.What will be the best?
PS- i want to make a commercial real time system. So i need good accuracy.
Firstly, I don't think combining RNN and CNN can achieve real time system. And I personally prefer detecting individual characters if I want real time system because there will not more than 10 characters on license plate. When detecting plates with variable length, detecting individual characters can be more feasible.
Before I learned deep learning, I also have tried to use OCR to detect plate. In my case, OCR is fast but the accuracy is limited especially when the plate is not clear enough. Even image processing cannot rescue some unclear case.......
So if I were you I will try as follows:
Simple image preprocessing on the whole image
Licence plate Detection (probably by sliding window method)
Image processing (filters and geometric transformations) on the extracted plate part to make it more clear. Separate characters.
Deploy CNN to each character. (Maybe I will try some short CNNs because of real time, such as LeNet used in MNIST handwritten digit data ) (Multithreading might be needed)
Hope my response can help.

Machine Learning: sign visibility

I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.

Medical image processing feature descriptors

I'm looking for local and global descriptors for medical image processing. I know about SIFT/SURF/GLOH/HOG, that are mainly applied to computer vision problems, but I would like to know if they are also applied to medical images to describe features or if there are specific descriptors in this field.
I would really appreciate any hint.
Thanks in advance,
Federico
If you want to use standard SIFTs for the multimodal matching, you have to a adjust it a little bit - make it invariant to image inversion.
There is a good paper about it by Kelman et.al "Keypoint Descriptors for Matching Across Multiple Image Modalities and Non-linear Intensity Variations"
There are also more special descriptors for multimodal matching, see "An efficient approach for robust multimodal retinal image registration based on UR-SIFT features and PIIFD descriptors" by Ghassabi et.al.
I assumed you need the descriptors for matching.
I'd personally submitted a poster submission and got it accepted for using SIFT as part of the feature detection and matching framework that my work was intended to do.
The feature detection methods you mentioned are good for general images and will work as a good general initial input for your framework, too. Now, since every anatomical region and every modality lives in its own feature domain(ie. brain regions done by MR, live regions done by CT, they all probably imply distinctive landmarks); its best that you first identify what it is unique in your or near your target anatomical region and then see if the aforementioned algorithms would locate your distinctive features(distinctive enough that it has to be in your region and no where else), then find ways to differentiate from the bag of features(that get detected along with your distinctive features). And the result sets would be the key features/descriptors that you would like to keep.
So, Yes, many feature detection algorithms have been extensively used for various areas in medical imaging.

Identify pattern in image

what is the best approach to identify a pattern (could be a text,signature, logo. NOT faces,objects,people,etc) in an image, given that all images are taken from the same angle, which means the pattern to identify will be ALWAYS visible at the same angle, but not position / size/ quality / brightness, etc.
Assuming I have the logo, I would like to run a test on 1000 images, from different sizes & quality and get those images that have this pattern embedded or at least a high probability to have this pattern embedded.
Thanks,
Perhaps you can show a couple of images but it seems like template matching (perhaps with a distance transform) seems like an ideal candidate to your problem.
Perl? I'd have suggested using OpenCV with python or C since you're on the Linux platform.
You could check out SURF and SIFT (explains how to do this with OpenCV and C++ with code attached) which can do decent template matching (logos, etc.).
Text detection is a different kettle of fish, I'd suggest Robust Text Detection in Natural Images with Edge-enhanced maximally stable extremal regions paper which is the latest I've seen that does robust text detection from natural scenes without becoming overly intricate.
Training a neural network with the expected patterns seems to be the best way all-round, though the training process will take a long time. Actual identification is almost real-time though.
Here's a discussion on MSER implementation in two libraries: a) OpenCV, b) VLfeat
Have you checked AForgenet.com ? It has great libs for blob processing. Its in .NET

learning steps for image recognition algorithm

I have decided to spend my personal time after office hours to learn the building blocks of how images jpeg type are parsed and represented in screen. My interest is on object recognition in an image.so I want to know where to start , I know there are math involved in this.so I needed step by step on what resources in Internet specifically to look at.
Need a lot more information on what you want, but take a look at OpenCV
http://sourceforge.net/projects/opencvlibrary/
To see good examples.
I'd get Ritter's book (warning: costly!) and give it serious studying. If you just want to grab existing code and go play then perhaps you should look at libraries like OpenCV (see Lou's answer).
The ultimate goal of most image processing is to extract information about some high-level and application-dependent objects from an image available in low-level (pixel) form. The objects may be of every day interest like in robotics, cosmic ray showers or particle tracks like in physics, chromosomes like in biology, houses, roads, or differently used agricultural surfaces like in aerial photography or synthetic-aperture radar, etc.
This task of pattern recognition is usually preceded by multiple steps of image restoration and enhancement, image segmentation, or feature extraction, steps which can be described in general terms. The final description in problem-dependent terms, and even more so the eventual image reconstruction, escapes such generality, and the literature of application areas has to be consulted.

Resources